


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 














Volume XXXII April, 1942 Number 4 








STUDIES OF THE 1937 REVISION OF THE STANFORD- 
BINET SCALE. I. VARIABILITY OF THE IQ AT 
SUCCESSIVE AGE-LEVELS'! 


FLORENCE L. GOODENOUGH 
Institute of Child Welfare, University of Minnesota 


That the IQ can have uniform meaning only if it maintains constant 
variation from age to age is an axiom in mental test construction. On 
p. 40 of Measuring Intelligence Terman and Merrill present data on 
this highly important question for the group of approximately three 
thousand cases used in standardizing the two forms of the 1937 
Revision of the Stanford-Binet. This table reveals certain systematic 
trends in IQ variability which, as the authors point out, may have 
arisen either as the result of some undetermined peculiarity of the test 
or from irregularities in the particular sampling of subjects. If the 
explanation lies in the sampling, the irregularity is of no particular 
consequence; but if the apparent discrepancies arise from the selection 
and arrangement of the test-items, the age differences are large enough 
to cause real confusion in the interpretation of test results. 

The discrepancies reported are almost identifical for Form L and 
Form M. In both, the highest SD’s occur at ages two and one-half 
and twelve years; the lowest at age six, as shown below: 








Age SD L-IQ SD M-IQ 
216 20.6 20.7 
6 12.5 13.2 
12 20.0 19.5 











The downward shift from the peak at two and one-half to the low 
point at age six is exceedingly regular, with no inversions whatever 





1 Assistance in the preparation of the data for this study was furnished by the 
personnel of the Works Progress Administration, Official Project No. 165-1-71-124 
W.P. No. 8760 Sub-project Unit No. 364. 

241 








242 The Journal of Educational Psychology 


in the curve for Form M, and but one minor inversion (14.2 — 14.3) 
in Form L. (See Table I, Columns 7-8.) The upward shift from 
age six to age twelve is also fairly regular in both forms though each 
shows one or two slight inversions. For the ages beyond twelve, 
no systematic trend in the magnitude of the standard deviations can 
be noted. The small differences that appear from age to age are thus 
best attributed to sampling. 

The shifts at the earlier ages, however, cannot be dismissed so 
lightly. If results found for other groups correspond roughly to those 
shown in Table I, the effect upon the IQ’s obtained by children who 
deviate considerably from the average is great enough to cause rather 
serious error in IQ interpretation. For example, a child who (on Form 
L) maintained a constant position of +3 SD from the mean of his 
age-group would earn an IQ of 162 at two and one-half years and of 
138 at six years. An IQ of 59 (—2SD) at age two and one-half would 
correspond to one of 75 at age six or 60 at age twelve. Inasmuch 
as the major purpose of intelligence testing centers about the identifica- 
tion and classification of intellectual deviates, variations of the magni- 
tude indicated are not trivial. The extent of the changes in IQ that 
would result from the age-changes in variational tendency reported 
by Terman and Merrill in individuals who truly maintained a constant 
level of ability are shown graphically in Fig. 1. 

The authors correctly point out that the only crucial test as to the 
source of these discrepancies (whether in the test itself or in the 
character of the standardization sample) would come from a longi- 
tudinal study of the shifts in IQ variability of the same group of 
children tested at successive ages. However, inasmuch as some years 
must elapse before the entire period from two and one-half to twelve 
years can be covered by a longitudinal study, it seems worth while 
to see what light cross-sectional studies of other groups can throw on 
this important problem. If the age-differences in variability of such 
groups follows the same pattern as appeared in the standardization 
group, the likelihood that the original discrepancies were due to 
biased sampling is small and the source of the error should be sought 
in the test itself. If no such trends appear, the hope expressed by the 
authors that the true variability of the scale does not change with age 
receives some support. 

In the time that has elapsed since the appearance of the revised 
scale, a fairly large number of subjects have been given either Form L 
or Form M by the members of our staff. In some respects these 








\e 


or kT eee OC OW Or OOS UI eel 


om © 





Studies of the 1937 Stanford-Binet Scale 243 


children do not represent the optimal kind of sampling for testing the 
hypothesis in question. They belong, for the most part, to the upper 
socio-economic levels, and they have had a variable amount of experi- 
ence in taking other tests. However, inasmuch as our present interest 
centers about the comparability of the standard deviations from age 
to age, the importance of maintaining a sampling similar to that of 
the general population is subordinate to that of making sure that the 
samples used at each of the various age levels are comparable with 
each other in respect to factors that are likely to cause differential 
variability. Probably the most important of these factors are (a) 
socio-economic status, (b) amount of previous test experience, (c) form 
of the test (Form L or Form M), and possibly (d) sex. 

170 





160 +39) 


" Nc semen. er 
120 


0 100 0 $0 


- en a lie 








- 350 











: 2 £6 7 “@ FF 2 PP ein = Be & & 
A GE 
Fig. 1.—Age changes in IQ at constant levels of variability. 

Except for the possibility that some of the children may have been 
given tests at school or by other agencies of which we have no knowl- 
edge, information on all four of these points is available for all cases 
tested by us. Matched groups were made up which were closely equal- 
ized in respect to socio-economic status, fairly closely in respect to 
amount of previous test practice,' and approximately balanced as to 





? Amount of previous experience with the 1937 Stanford test was approximately 





244 The Journal of Educational Psychology 


sex. Because of the relatively small number of cases at each age that 
could meet the necessary requirements for matching, and inasmuch 
as Terman and Merrill report almost identical standard deviations 
for the two forms of the test (see Table I), it was decided not to 
attempt to separate the forms, since the reduction in numbers would 
make separate computation of the standard deviations very unreliable. 
The proportion of the two forms is very similar throughout the age- 
range. At each age about two-thirds of the cases had been given 
Form L, about one-third Form M. 

As finally made up, each of the separate age-groups conforms fairly 
closely to the following criteria for matching: 

A. Socio-economic Status..—Group I, forty per cent of all cases, 
Groups II and III, thirty-five per cent, Groups V, VI, VII, twenty-five 
per cent. 

B. Previous Test Experience on 1937 Stanford.—About seventy-five 
per cent had never been given this test before; about twenty per cent 
had one previous testing; about five per cent, two or more. 

C. Previous Experience with Other Tests—About twenty per cent 
had had none (as far as we know), about forty per cent had had one; 
about forty per cent two or more.’ 

D. Sex.—About fifty-one per cent were boys, forty-nine per cent 
girls. 

E. Form of Test Given.—About sixty-seven per cent of the tests 
were Form L, thirty-three per cent Form M. 





similar for all ages, but it was not possible to control this factor absolutely when 
tests other than the Stanford were taken into account. Many of the older children 
had been used as subjects in the Minnesota study of mental growth. These 
children had been tested at annual intervals from early childhood. As a necessary 
consequence, the older children in this group had many more tests, previous to the 
1937 Stanford, than had the younger ones. Exact equalization of the age groups 
in respect to this factor would have meant the elimination of so many of the older 
children that the results would have had little meaning because of the few cases. 
It will be shown in a later report that when tests of the Binet type are separated by 
an interval of several months (as is the case here) the effect of previous practice is 
so slight as to be negligible. It seems improbable, therefore, that the failure to 
maintain complete equality of the several age groups in respect to the amount of 
previous test-practice has had any material effect upon the standard deviations. 

1 Minnesota Occupational classification. No rural cases (Group IV) were 
included. 

2 The failure to maintain complete matching of the age-groups mentioned in the 
preceding footnote is largely confined to the third group. At the earlier ages 
“two or more” commonly means ‘two only”’; at the older ages, “‘more than two.” 











i a SY SE hUCr 





Studies of the 1937 Stanford-Binet Scale 245 


Table I shows the means and standard deviations obtained for 
these groups by ages separately. We have not divided the sexes, 
inasmuch as Terman and Merrill report no consistent or reliable sex 
differences, and their data on variability are given only for the sexes 
combined. 


TaBLE I.— MEANs AND STANDARD Deviations oF IQ at Successive AGE LEVELS 











Children from Minnesota study of mental Standardization group 

growth. Tests by trained psychometrists} (reported by Terman and Merrill) 
Age N M SD N M SD. SD 
2 49 118 16.8 102 102 16.7 15.5 
2% 102 105 20.6 20.7 
3 91 120 17.0 99 103 19.0-|; 18.7 
3% 103 104 17.3 16.3 
4 120 123 14.4 105 99 16.9 15.6 
4he 101 101 16.2 15.3 
5 100 119 12.7 109 102 14.2 14.1 
5% 110 98 14.3 14.0 
6 64 120 14.2 203 100 -12.5 13.2 
7 64 122 16.0 202 101 16.2 15.6 
8 61 126 19.4 203 101 15.8 15.5 
9 56 129 18.8 204 104 16.4 16.7 
10 62 128 16.3 201 104 16.5 15.9 
1] 63 127 17.9 204 102 18.0 17.3 
12 55 127 17.7 202 101 -20.0 19.5 
13 47 130 16.3 204 102 17.9 17.8 
14 32 124 18.4 202 100 16.1 16.7 
15 28 128 17.9 107 102 19.0 19.3 


























1 The means cited here are the raw values for the L and M Composite as given 
in Table 6, p. 36 of Measuring Intelligence. The SD’s and the N’s are from Table 7, 
p. 40 of the same book. 


Because of the small number of cases, we have combined the half- 
year age-groups at the earlier levels. It will be noted also that (as 
was to be expected from the distribution of paternal occupations), 
the mean IQ’s of our group are decidedly above average at all ages. 
From age two through age seven the means show only small diver- 
gences from a constant value of 120; from age eight on they fall within 
the range of 124-130. The median values are 120 at the lower ranges, 
and 127.5 at the older ages. This shift in test standing does not 
correspond to any known factors either in the selection of subjects 


i | 





246 The Journal of Educational Psychology 


or in the conditions of testing. It may possibly be a result of the fact, 
previously mentioned, that the older children had had somewhat more 
experience than the younger ones in the taking of tests of different 
kinds, even though there was no difference between the groups in 
amount of experience with the Stanford. Opposed to this hypothesis 
is the fact that a fairly large number of the children have been given 
tests by public school examiners at the end of the kindergarten period, 
as a check upon their probable ability to do the work of the first grade. 
This is the only known point on the age-curve at which a large increase 
in the amount of previous test experience occurs. If the changes in 
mean IQ arose from the effect of practice, one would expect the sudden 
increase to appear at the age of six years, but it does not. Moreover, 
as was previously stated, data to be presented elsewhere indicate that 
under the conditions of testing that prevailed for our cases, the effect 
of practice may be safely ignored. This statement, of course, does 
not hold good for all tests nor for all testing conditions, but it is in 
general agreement with many previous reports on the constancy of 
Binet IQ’s when an interval of several months elapses between testings. 

A more plausible explanation for the shifts in mean IQ for this 
group of superior children is to be found in the shifts in the standard 
deviations. When allowance is made for the mental acceleration of 
our group, it is evident that the general trend corresponds fairly closely 
with that found for the standardization group as reported by Terman 
and Merrill, except for the fact that the peak at the earliest ages is less 
pronounced. This difference is in all probability a result of special 
selection of the younger children in our group. A large proportion of 
these cases were candidates for admission to our nursery school. This 
almost of necessity means some curtailment at the lower end of the 
intellectual distribution, since the parents of a retarded two-year-old 
are unlikely to try to place him in a nursery school such as ours.' 
Between the ages of three and six years, the drop in the standard 
deviations appearing in the standardization group is closely paralleled 
by our figures, except that the low point comes about a year earlier. 





1 Even if the attempt is made, if the preliminary interview with the mother 
discloses such evidences of backwardness as inability to talk, lack of training in 
toilet habits and so on, the chances are that the parents will be advised to keep the 
child at home for a time until he has become more mature. If such a child is 
brought for examination at all, he will, as a rule, be referred to our clinical division 
for complete study. Cases from this division have not been included in the dis- 
tributions upon which Table I is based, because of bias in their selection. 








=~ BS = oF I 


ee ee ee, es een” | ee 


~ — “le — Se — ns Te MR mtn 





Studies of the 1937 Stanford-Binet Scale 247 


This is likewise to be expected on the basis of the advanced intellectual 
level of these children. Thereafter the variability again increases. 

Our data fail to show the second peak at age twelve or thereabouts 
that appears in the original group. It may be noted, however, that 
this peak is not so sharply defined in that group as is the low point at 
six years. At age fifteen, the variability of the standardization group 
is almost as great as it is at age twelve. This suggests that the 
increased dispersion at age twelve may be actually only a fluctuation 
of sampling. It is possible that the only real departure from constant 
variability occurs at about the six-year-level. It is also possible, 
however, that the non-appearance of the peak at age twelve may be a 
peculiarity of our sampling of subjects. The fact that a secondary 
peak at age twelve is shown in our second group of cases lends support 
to this hypothesis. 

Table II summarizes the results of tests given by graduate students 
enrolled in an advanced course in mental testing. These students 
had all had previous training in testing before admission to the course 
and had completed one full quarter of practice testing under close 
supervision before beginning the work in the public schools upon which 
the data of Table II are based. Only those students who had previ- 
ously demonstrated considerable competence in testing took part in 
this survey. Thus, although we have thought best to differentiate 
between the tests given by students and those given by the regular 
psychometrists of the Institute staff which form the basis for Table I, 
it should be noted that at the time these tests were given, the student- 
examiners were by no means inexperienced in testing. All were well 
toward the end of their period of training after which many of them 
have gone directly to regular, full-time positions as psychometrists 
or clinical psychologists. 

Table II presents the results obtained from a survey of two public 
elementary schools and one parochial school, all located in the south- 
east section of Minneapolis, not far from the University. The mean 
IQ’s are undoubtedly affected by the relatively large proportion of 
the children whose parents are faculty members or University students. 
They are, however, lower than those found for the children of the 
growth study. This is, we believe, a difference in the character of the 
populations, inasmuch as the proportion of children from the upper 
socio-economic classes, while greater than that of the general popula- 
tion, is nevertheless considerably below that found in the growth study. 
Moreover, these schools include a goodly sprinkling of children of 





248 The Journal of Educational Psychology 


day-laborors and slightly skilled workmen—classes that have only 
meager representation in the former group. 


TaBLE II.—MEans ANpb STaNDARD Deviations or IQ at Successive AGE LEvELs 
(ELEMENTARY ScHoOoL CHILDREN. TEsTs BY GRADUATE STUDENTS.) 

















Age Form of test! N M SD 
5 L-M 54 114 14.0 
6 L 76 111 14.2 
6 M 25 112 13.8 
6 L-M 101 111 14.4 
7 L-M 78 114 16.7 
8 L-M 76 115 16.9 
9 L 48 112 14.8 
9 M 20 108 16.8 
) L-M 68 111 16.8 

10 L-M 58 114 19.8 
11 L-M 44 116 16.8 
12 L-M 37 111 21.9 
13 L-M 42 116 16.8 








1 At each age from two-thirds to three-fourths of the tests were Form L, the 
remainder, Form M. Only at the ages of six and nine years was the number of 
Form M tests large enough to warrant separate computation. We have, however, 
prepared separate frequency distributions for the two forms at all ages in order to 
see whether or not any significant differences were present. The distributions 
throughout were found to be quite as similar as those shown at ages six and nine in 
the table. 


Here, too, the shifts in the magnitude of the standard deviations 
follow the same pattern as those found for the standardization group. 
The second peak at age twelve again appears. Although the number 
of cases at that age is not large, the coincidence is certainly worth 
consideration. 

Relatively few studies on this topic have thus far appeared in the 
literature. The only article I have been able to locate which provides 
material on the variability of the IQ’s obtained from the 1937 Revision 
over a fairly wide range of ages and for sufficiently large groups is by 
Ebert.! She was chiefly concerned with a comparison of the predictive 





1 Ebert, Elizabeth H.: ‘‘A Comparison of the Original and Revised Stanford- 
Binet Scales.” Journal of Psychology, 1941, x1, pp. 47-61. 








Studies of the 1937 Stanford-Binet Scale 





249 


values of the 1916 and the 1937 Revisions. However, she included 
standard deviations in her tables of correlations. The subjects of her 
study were children enrolled in the longitudinal investigation of mental 
growth in progress at the Brush Foundation, Western Reserve Uni- 
versity. The groups listed in Table III represent various samplings 
from a total population of three hundred fifteen children. 
edly there is overlapping of cases from sample to sample but the extent 


of the overlap is not indicated. 


Undoubt- 


TaBLe II].—Means anp STanparp Deviations oF IQ’s ON THE 1937 REVISION 
OF THE STANFORD-BINET AS REPORTED BY EBERT! 
































Median for age 
Group Age N M SD 
M SD 
1 6 146 116.8 13.7 
2 6 91 116.0 13.5 116.4 13.6 
3 7 188 120.1 15.2 
4 7 133 118.5 14.8 
5 7 42 114.4 12.3 119.3 15.0 
6 7 146 121.8 15.5 
7 8 174 124.0 16.7 
8 8 83 123.5 15.2 
9 8 41 124.7 14.6 124.0 16.7 
10 8 91 124.5 17.9 
ll 8 133 123.8 17.3 
12 9 120 124.9 17.0 
13 9 78 124.8 17.7 
14 9 37 122.7 18.5 124.9 17.6 
15 9 60 125.7 17.6 
16 9 42 125.0 15.7 
17 10 60 129.6 16.4 
18 10 41 130.2 16.4 120.9 16.4 
1 Op. cit. 


The trends revealed in Table III are very similar to those from our 
growth study for groups of corresponding age. 


In both cases, there is 


an increase in mean IQ with advancing age between the ages of six 
and ten years. 





In both instances, the standard deviations at the 





250 The Journal of Educational Psychology 


six-year level are distinctly lower than those for the ages from eight 
years upward. In both, the seven-year value is intermediate between 
those immediately preceding and following it. Ebert gives no figures 
for the ages earlier than 6 years, which makes it impossible to say 
whether or not our finding of an age-acceleration in the appearance 
of the lowest point on the variability curve would have been duplicated 
for this second group of superior children. It will be noted that the 
mean IQ’s of Ebert’s cases are approximately the same as those shown 
in Table I for the children studied here. 

That the increase in mean IQ for these two groups of superior 
children is quite in line with expectation on the basis of the increase 
in the standard deviations will be immediately apparent from inspec- 
tion of Fig. 1. The fact that the means for the public school children 
do not show such an increase is in all probability due to the fact that 
the latter is more nearly an average group. Among our growth study 
cases, almost no children rank below 100 IQ. ‘Thus an increase in 
variability due to the character of the test would bring about an 
apparent increase in the IQ’s of practically all the cases, with little or 
no compensating effect from the levels below 100. In an average 
group, the increased spread would take place in both directions equally 
(assuming a normal distribution) and the group means would therefore 
not be affected. 


SUMMARY AND CONCLUSION 


The changes in IQ variability on the 1937 Revision of the Stanford- 
Binet Scale originally reported by the authors have been substantiated 
by results obtained from three independent sources. Variability is 
lowest at about the age of six years, highest at two and one-half to 
three years and probably at twelve years. The differences are great 
enough to produce spurious age-changes of from 15-20 IQ points in 
the apparent standing of an individual child who ranks from 2-3 
SD above or below the mean of his group. Evidence presented in 
this study shows that the difference in variability may bring about 
changes as great as 8-12 points in the mean IQ of groups made up 
chiefly of children of superior ability. Presumably, similar changes 
would occur in groups of retarded children. 

A number of investigations showing the general superiority of the 
1937 Revision over the 1916 Scale have appeared in the literature. 
Briefly, these investigations demonstrate the following points of advan- 
tage: Higher correlations and smaller IQ changes between original 








Studies of the 1937 Stanford-Binet Scale 251 


test and retest; no general tendency for the IQ to drop at the older 
ages because of errors in standardization as was true of the earlier 
scale; more complete sampling of abilities with consequent better basis 
for clinical interpretation of test results. That the test can stand up 
so well in these respects in spite of the error pointed out in this paper 
speaks well for its general merit. Moreover, the error described here, 
while serious, is one that can be corrected by any one of several 
procedures. Some re-arrangement of the test items might be the 
only change necessary. That such re-arrangement may be desirable 
is suggested by several reports indicating that certain items are at 
present incorrectly placed. Another possibility lies in the conversion 
of test scores into standard scores.' If it is thought desirable to make 
use of the widespread familiarity with the concepts of IQ significance 
that have been developed throughout the educational world, these 
standard scores may then be re-converted into values of constant 
magnitude by a slight modification of the formula commonly used for 
expressing measures of different functions in terms of a uniform stand- 
ard. The usual procedure consists in assigning a constant value of 
50 to the mean, while the SD is set at 10. An individual score then 
becomes: 


X = 50 + 10z. 


The only modification needed to adapt this formula to the ordinary 
distribution of intelligence quotients is a change in the constants as 
shown below: 


IQ equivalent = 100 + 17z. 


The use of 17 as a constant value for the standard deviation is based 
upon the assumption of this value in the table mentioned in the foot- 
note. That some such adjustment is called for seems clearly indicated 
by the evidence of this study. 





1It should be noted that the standard scores given on p. 42 of Measuring 
Intelligence do not meet this problem, because they are based upon the assumption 
that variability does not change with age. It is, however, quite feasible to draw 
up a table of standard score equivalents, taking unequal variability into account, 
that would correct for the age factor. 








MEASUREMENT AND DEVELOPMENT OF THE 
FINGER SCHEMA IN MENTALLY RETARDED 
CHILDREN; RELATION OF ARITHMETIC 
ACHIEVEMENT TO PERFORMANCE ON THE 
FINGER SCHEMA TEST* 


HEINZ WERNER AND DORIS CARRISON 
Wayne County Training School, Northville, Michigan 


One important genetic problem of the perception of space and 
spatial relationships concerns body orientation. Several investi- 
gations**:!4 have shown that the child gradually learns to differentiate 
between the parts of his body, and to localize them as to left and right, 
before and behind, etc. 

For the analysis of the perception of the body Schilder,® Gerst- 
mann’? and others have introduced the term “‘ body schema,” referring 
to a subjective spatial image of one’s own body. It is a body-orienta- 
tional plan, either consciously or unconsciously maintained, which 
involves the single parts of the body in their spatial relationships to 
each other. 

The fingers of each hand are a highly specialized area of the body 
schema. Most clinical work on the ability to discriminate between 
the fingers has been done in psychopathology. This ability, as has 
been reported by Gerstmann in 1924, may be impaired as a consequence 
of a brain lesion—a defect which is known as “‘finger agnosia.” Since 
Gerstmann’s contribution, many other cases of deficiency in the 
recognition of the fingers have been described. It is important to 
note that frequently this condition is associated with an inability to 
cope with numbers. . 

Recently, Strauss and Werner'?'*'5 have introduced the term 
‘finger schema,”’ denoting a spatial image of the fingers as differen- 
tiated parts of the hand. They were interested particularly in an 
examination of functions bearing on educational retardation. They 
presented evidence that deficiency of the finger schema occurs in 
children and may be one of the factors impeding the learning progress, 
especially in the subject of arithmetic. In the course of a functional 
analysis of severe arithmetic disability in metally retarded children, a 





*The authors acknowledge their indebtedness to Dr. Robert H. Haskell, 
Medical Superintendent of the Wayne County Training School. They also 
express their appreciation to Dr. Sidney W. Bijou of the Research Department for 
valuable suggestions and for the reading of the manuscript. 

252 











Finger Schema and Mentally Retarded Children 253 


finger schema test was devised. The results suggest that impairment 
in the finger schema as revealed by this test has a reference to arith- 
metic disability. 

It is to be expected that the finger schema has an ontogenetic 
development. In building up the body image as well as the finger 
schema, as is evident from clinical investigations, tactile-kinesthetic, 
optical, and verbal factors are involved. 

The present study, carried out with mentally retarded children of 
different’ mental ages, deals mainly with the genetic aspect of finger 
schema as measured by a specially constructed test battery; also, 
further research was performed on the relation between finger schema 
and arithmetic achievement. 


THE FINGER SCHEMA TEST 


In the course of a lengthy search for a test which would enable the 
clinical examiner to detect marked deviations in the ability to differen- 
tiate fingers, a Finger Schema Test was constructed. The essential 
feature of the test is the same throughout the various test items; 
namely, the child is required to localize fingers which have been 
indicated by the examiner. In one item, for instance, the examiner 
touches a certain finger while the child’s eyes are closed, after which 
the child is required to point to the finger indicated. The variation 
from item to item involves the number of the fingers indicated (one or 
two fingers); the means by which the examiner or the child indicates 
the fingers (either by naming or by touching); and the directness 
of the performance (whether the child indicates directly on his own 
hand, on the hand of the examiner, or on a picture of ahand). In the 
present form the test includes the following ten items:* 

Item 1.—The child is asked to lay his right arm and hand on the 
table with his fingers spread. He is asked to indicate with the pointing 
finger of the left hand a particular finger of the right hand named 
by the examiner;. for instance, “Show me finger two.”’ This is 
repeated with several fingers. The whole series is repeated on the left 
hand with the right hand doing the pointing. The order of the fingers 
asked for is: 

Right hand: 1; 4; 2; 3; 5. 

Left hand: 1; 4; 2; 3; 5. 





* The original test included thirteen items. Three were eliminated because 
they were deemed to be extraneous to the perception of the finger schema. 





cee ae ese Oe ee 





ae. ee 





254 The Journal of Educational Psychology 


Item 2.—Same as the above, only two fingers are asked for at a 
time; for example, ‘“‘Show me fingers one and five.’”’ The child is asked 
to point to the fingers successively, not simultaneously, when making 
responses. The order of the fingers named by the examiner is: 

Right hand: 1-5; 3-2; 4-1; 5-4; 2-4. 

Left hand: 4-5; 5-3; 1-4; 4-3; 2-5. 

Item 3.—The child is asked to indicate with the pointing finger of 
the left hand the finger that is touched by the examiner on the right 
hand. This is repeated on the left hand. The order of the fingers 
touched is: 

Right hand: 1; 3; 4; 2; 5; 2; 5; 2; 4; 3. 

Left hand: 1; 3; 4; 2; 5; 2; 5; 2; 4; 3. 

Item 4.—Just like the above item, only two fingers are touched 
successively at a time. The order is: 

Right hand: 1-5; 3-2; 4-1; 5-4; 2-4; 4-5; 5-3; 1-4; 4-3; 2-5. 

Left hand: 1-5; 3-2; 4-1; 5-4; 2-4; 4-5; 5-3; 1-4; 4-3; 2-5. 

Item 5.—The child’s finger is touched, and he tells its number. 
The order of the fingers touched is: 

Right hand: 3; 4; 2; 5; 2. 

Left hand: 5; 2; 4; 3; 1. 

Item 6.—Same as the above item only two fingers are touched 
successively. These fingers are touched in this order: 

Right hand: 2-5; 4-3; 1-4; 5-3; 4-5. 

Left hand: 2—4; 5-4; 4-1; 3-2; 1-5. 

Item 7.—The child is told to lay his right hand on the table opposite 
to the examiner’s right hand, and to spread his fingers. Then, one 
of the child’s fingers is touched, and he opens his eyes to touch the same 
finger on the examiner’s hand. Directions similar to the following are 
given: “‘I touch yours and you touch mine as in a game.”’ Care is 
taken to prevent the child from touching his own finger before he 
touches that of the examiner’s. A preliminary practice period is 
especially necessary with the younger children. The fingers touched 
are: 

Right hand: 1; 3; 5; 2; 4; 2; 3; 1; 4; 2. 

Left hand: 1; 3; 5; 2; 4; 2; 3; 1; 4; 2. 

Item 8.—Same procedure as in the above item only with two 
fingers successively touched. The order is: 

Right hand: 1-4; 3-5; 1-4; 4-1; 5-2; 1-5; 4-1; 5-3; 4-2; 5-2. 

Left hand: 1-4; 3-5; 1-4; 4-1; 5-2; 1-5; 4-1; 5-3; 4-2; 5-2. 





Finger Schema and Mentally Retarded Children 255 


Item 9.—The examiner traces the child’s hands on the black- 
board. The left hand is covered with a piece of cardboard. The 
child, facing the blackboard, holds out his right hand to be touched 
by the examiner. The child is supposed to indicate which of his 
fingers is touched by touching the corresponding finger on the tracing. 
After the touchings are completed on the right hand, the cardboard is 
removed from the left tracing and placed over the right. The order 
of the touching is: 

Right hand: 1; 3; 4; 2; 5; 2; 5; 2; 4; 3. 

Left hand: 1; 3; 4; 2; 5; 2; 5; 2; 4; 3. 

Item 10.—The last item of the Finger Schema Test is the same as 
the above item only that two fingers are touched. The order of 
presentation is: 

Right hand: 1-5; 3-2; 4-1; 5-4; 2-4; 4-5; 5-3; 1-4; 4-3; 2-5. 

Left hand: 1-5; 3-2; 4-1; 5-4; 2-4; 4-5; 5-3; 1-4; 4-3; 2-5. 

A few general remarks concerning the administration of the items 
may be added. 

1. Order of Numbering the Fingers—The number designations 
follow this system: 


Left Hand Fingers Right Hand Fingers 
5 4 3 2 1 1 2 3 a 


little, ring, middle, index, thumb thumb, index, middle, ring, thumb 
If a child has acquired a different method of numbering his finger i.e. 
beginning to count over from the little finger as one and the thumb as 
five; or the pointing finger as one, middle finger as two, next to last 
finger as three, little finger as four, and thumb not numbered, the 
examiner uses the child’s system and transposes the numbers of the 
fingers to comply with the system used in the test; this is necessary in 
the verbal test items one, two and five, six. 

2. Method of Touching the Fingers.—In the items where the 
examiner indicates the fingers by touching, a knitting needle with a 
smooth, round steel point is used.* The fingers are touched between 
the nail and first joint with a slight pressure. 

3. Closing and Opening the Eyes.—In all items the child’s eyes are 
closed when the fingers are indicated, and with the exception of items 
one and two, are opened during the time of response. Especially in 
those items where two fingers are touched successively, the examiner 
insists that the child keep the eyes closed until after the second touch. 











* We used a needle with a rounded point of 42 of an inch in diameter. 











256 The Journal of Educational Psychology 


4. Method of Scoring.—For each child, a test form is used for listing 
the order of fingers named or touched for each item to facilitate record- 
ing errors. It is to be noted that in the two-finger tests, the child 
scores one error whether one or both fingers are missed. However, a 
mere change of succession in the child’s response is not counted as an 


error. 
RELATION BETWEEN FINGER SCHEMA AND MENTAL AGE 


The Finger Schema Test was administered to eighty mentally- 
retarded children from the Wayne County Training School, North- 
ville, Michigan. The mental retardation of these subjects is of the 
so-called endogenous or familial type; children whose mental impair- 
ment is due to brain-injury were excluded. The mean IQ taken from 
the revised Stanford-Binet was 68.5, the median 68.4, and the inter- 
quartile range 64.4 to 73.4. The entire group was divided into four 
subgroups according to mental age. As Table I shows, there is the 
same number of children in each group; the means of the mental ages 
are practically at the mid-point, and the mean IQ’s are almost the 


same. 
TaBLE I].—Tue Four Ace Groups SHOWING THE MEAN MENTAL AGES 








AND IQ’s 
MA Number | Mean MA! Mean IQ 
ais 4 Carceks vied ashes abaadued 20 6.4 66.8 
Dan's Lips Aah eee eeameeeebe dae saed 20 7.5 68.3 
Ns ins ou oleh aah wb eee ood dé bce 20 8.4 68.7 
oes io Sailer iee Ue Se oi. f 20 9.5 70.2 














Figure 1 illustrates the general inverse relationship between error 
scores and mental age. As expected, the difficulty of the single items 
pertains to the complexity of the task; hence, items in which two fingers 
are involved are harder than those with one finger. For example, the 
touching items eight and ten are the most complicated and also the 
most difficult. 

Items in which the finger is designated by stating its number may 
be called ‘‘ verbal” items and those in which the child only points at 
the finger “‘non-verbal.”” When the verbal test items one, two and 
five, six are compared with the non-verbal items three, four, one finds 
a more rapid decrease of errors with increases in mental age in the 
former. If the scores for the two younger and the two older mental- 


’ 
f 
t 

















Finger Schema and Mentally Retarded Children 257 


age groups are combined, respectively, the errors on verbal item two 
drop from 30.5 per cent to 7.5 per cent while the drop on the non- 





























60/% 8 

30! 0 
4 
6X 

40 
- 

30. 
5 
7 

20); 9 

10. 
MA 6/7 7% 8/5 %/0 


Fic. 1.—Mean error score in terms of per cent for each item and mental age. Each line 


refers to one of the ten items of the test. 


verbal item four is only from 39.0 per cent to 28.0 per cent.* A 
similar, but less marked, trend can be found if verbal item six, in which 
the child’s task is to name the fingers, is compared with the same 


* For this comparison only the two-finger items are mentioned because of the 
small number of errors on the one-finger items. 








258 The Journal of Educational Psychology 





non-verbal item four: the decrease in errors in the verbal item is from 
49.0 per cent to 28.5 per cent, while in the non-verbal item it is from 
This probably indicates the increasing 
importance of language factors in the development of the finger 
schema. The mental age level of eight seems to be the critical age 


39.0 per cent to 28.0 per cent. 


for this development. 





41 


mM 
Errors 





O 


L 


? 


i 








MA &% 


% 


Sy 


Ho 


Fie. 2.—Error scores in terms of weighted mean for each mental age. 


Table II presents the composite mean error score for each mental 
age. These results are shown graphically in Fig. 2. The composite 


TaBLE II].—WeEIGHTED. MEAN ERROR Scores PER MENTAL AGE 








MA 6/7 7/8 8/9 9/10 

Weighted mean................ 4.24 2.85 1.59 77 

Oe eas weak o 8 5c ecebes bose ws 4.95 3.87 3.51 3.02 
oF 1.42 1.40 86 
.18 .18 .40 

















mean score was obtained as an average of the means of the scores on 
each item weighted on the basis of reliability.* A steady decrease 1s 


| 





* The formula (4) used was: wM, = 





g 


1 ] 
- aa 


R 


o, 


Mr (=) + Mz (=) 
oR or 














oO — 





Finger Schema and Mentally Retarded Children 259 


revealed in the total composite error scores with increase in mental 
age. The difference between the successive mental age levels is not 
significant at the five per cent level, according to Fisher’s ¢-test.* 
The results of combining two mental age levels appear in Table 
III: the difference between the six to eight and eight to ten mental 
ages is significant at the five per cent level. 

In conclusion, it can be said that the test suggests a genetic rela- 
tionship between the accuracy of performance on the Finger Schema 
Test and mental age. 


Taste III.—WeicuHTep Megan Error Scores ror ComBINneD Two YOUNGER 
AND Two OLpER MENTAL AGES 








MA 6/8 8/10 
Cd inceceke se eaeetheweessund peaweuens 3.5 1.1 
CUE OS wl s 66s 680s sae dane hn ness 6beu hee eecea 4.4 3.2 
Mec scebnbewdes bs ots teneuabeteubetsuseniwedauaes 2.0 
SEG Ee eT Se er Pe ee eee er .05 











RELATION BETWEEN FINGER PREFERENCE AND MENTAL AGE 


The foregoing results reflect an increasing ability to differentiate 
between the fingers as the child grows older mentally. Little can be 
said at present about the factors responsible for such a development, 
but some evidence may be mentioned which seems to have a bearing 
on the problem. It refers to certain preferences in the choice of 
fingers. The following analysis of choices was based on a record which 
contained the actual performance of the child on each item. The per- 
centage of the number of times each finger was chosen relative to the 
number of times it was presented was computed. Since one finger 
tests show relatively few errors, only the two-finger non-verbal test 
items four, eight, and ten were used in this calculation. The frequency 
of the fingers presented was for each age group: finger one, 440; finger 
two, 360; finger three, 320; finger four, 680; and finger five, 600. 

Two outstanding trends are apparent: a preference for the middle 
finger, which decreases with an increase in mental age; and a non- 
preference for the fourth finger in the younger mental-age groups. 





* The reliability coefficient has not yet been ascertained since we intend to 
repeat the Finger Schema Test in a group of normal children and at that time to 
calculate its reliability. Those who wish to use the test as described here should 
take into account the experimental nature of this study. 








260 The Journal of Educational Psychology 


This seems to indicate that the least degree of differentiation, appearing 
particularly in the younger mental-age groups, exists between the third 
and fourth fingers with the third finger predominating. As is shown 
in Fig. 3, the decrease of errors with increase in mental age is reflected 
in the approach of preference and non-preference toward the 100 per 


cent line. 








145% \ 

. / \ 

140 ‘a ——. aa “7 

135 / a 

1130 / ee * 89 | 
\ * @ 

125 / \ V0 





120 
IS a, \ \ 


/ 
aife) / fs i \ 





| 
| 
| 
| 
| 
| 








Finger | 2 3 4 5 





Fie. 3.—Preference and non-preference of the five fingers. Choice is expressed in per 
cent of total number of presentations of each finger. 

The preference of the middle finger may be explained on the basis 
of its conspicuousness within the perceptual field (central position, 
size); the non-preference of the fourth finger may be due to its having 
least significance of all the five fingers.* A more thorough analysis, 
though desirable, is not warranted by our data. 


RELATIONSHIP BETWEEN FINGER SCHEMA AND ARITHMETIC ACHIEVEMENT 


Finger agnosia as found in brain-injured adults is often associated 
with an impairment of number concept.'? Strauss and Werner” 





* The thumb and little finger are conspicuous since they form the outer edges 
of the hand. The second finger also forms an edge (sometimes younger children 
begin to count the fingers with this finger as number one); also, finger two is func- 


tionally important, viz. as the pointing finger. The fourth finger, however, 
outstanding neither as to configuration nor function. 











on | —= -_ ren VS 








Finger Schema and Mentally Retarded Children 261 


on the basis of findings from two groups of mentally-retarded children, 
one advanced, the other retarded in arithmetic achievement, concluded 
that a relationship exists between finger schema deficiency .and arith- 
metic disability. In order to determine whether such a relationship 
can be observed with our subjects and technique, the children of each 
mental age were ranked with respect to performance on the test. To 
compare individual scores of each child with one another and with the 
group, the raw scores on each item were transformed into sigma scores. 
For each child the mean of the sigma scores of the ten items was 
computed; this mean sigma score then defines his rank order within his 
mental age group. 

One cannot expect a strict correlation between the scores on the 
Finger Schema Test and arithmetic achievement since the mental age 
groups are relatively small. Therefore, only the children with extreme 
sigma scores were selected for this comparison. Children having a 
sigma error score of more than +.5 sigma were compared with those 
children whose score was lower than —.5 sigma. The results of this 
comparison have been combined into Table IV. The arithmetic 
achievement has been expressed in terms of a ratio (1) between arith- 


metic age and mental age (4), (2) between arithmetic age and 


reading age (M4), and finally, (3) between arithmetic age and the 
mean of mental age and reading age, a ratio called the arithmetic index 


i 2AA es ;' ~ 
(at = MAGTR x): All these ratios were multiplied by 100. 


In each mental age the three ratios of the children having high error 
scores on the Finger Schema Test are less than those of children having 
alow score. In other words, the children who do better on the Finger 
Schema Test are better in arithmetic. By combining the cases into 
two extreme groups (eleven cases in one and twelve in the other), the 





ratio MA for the children having higher errors scores on the Finger 


Schema Test is 94, against 104 for the other group, a difference which 
is significant at the two per cent level. Also, similar differences of 


103 versus 114 for e; and 98 against 108 for AI are highly significant. 


In summary, a relationship exists between performance on the 
Finger Schema Test and arithmetic achievement. This relationship 





262 The Journal of Educational Psychology 


TaBLE IV.—ARITHMETIC Ratios or CASES WITH EXTREME S1GMA SCORES ON THE 
FiInGER ScuHemMa TEST 





























o-score AA AA 
MA (errors) Number MA RA Al 
>+.5 3 104.3 107.0 105.6 
7/8 
>-—.5 5 109.4 114.2 112.0 
>+.5 4 90.0 107.0 97.8 
8/9 
>-—.5 3 97.0 120.3 107.3 
>+.5 5 89.8 98.0 93.4 
9/10 
>-—.5 3 101.3 108.0 103.3 
Sauls oss oe >+.5 12 94.0 103.0 98.0 
>-—.5 1] 104.0 114.0 108.0 
Eee dea’ aaekens be 2.65 2.65 3.41 
* A  ., SE a <.02 <.02 <.01 








holds true whether we measure arithmetic achievement in respect to 
mental age, reading age, or both. 


SUMMARY AND CONCLUSIONS 


Originally, the Finger Schema Test was devised to be used in an 
analysis of functions which have a bearing on educational development. 
The test refers to the ability of the child to differentiate between the 
fingers of each hand. 

The test which includes ten items was given to eighty mentally- 
retarded children ranging in mental age from six to ten with an average 
IQ of 68.5. The results show the following: 

(1) Accuracy in differentiating the fingers increases on each item 
and on the whole battery with increase in mental age. 

(2) Two groups of children showing extremely high and low scores 
on the Finger Schema Test have corresponding arithmetic achievement 


scores. | 
(3) Errors on the verbal items decrease more markedly than non- 
verbal ones with increasing mental age. 

(4) Preference in the choice of the middle finger and non-preference 
of the fourth finger decreases with mental age. 











Finger Schema and Mentally Retarded Children 263 


At present, little can be said about the qualitative nature of the 
finger schema development. The only data obtainable come from 
psychopathology; these data seem to indicate that the organization 
of the finger schema is a rather complex phenomenon. Schilder!®™ 
and others have observed that the pathological lack of recognition 
of the fingers, so-called “‘finger agnosia,’”’ can be found associated with 
quite different types of impairment. This deficiency may be asso- 
ciated with tactile-kinesthetic disability; it may come about in 
consequence of a defect of visual form perception; or it may be mani- 
fested only as a disturbance in the naming of the fingers, and in this 
way be connected with a disturbance of the language functions. 
Though the pathological symptom of finger agnosia must not be con- 
fused with a genetic retardation of the finger schema, these observa- 
tions seem to suggest that tactile-kinesthetic, visual, and verbal 
factors play a part in the differentiating of the fingers. These factors 
may vary in significance during the developmental stages. As to the 
verbal factors in particular it should be recalled that our data show a 
greater decrease in errors in the verbal than in the non-verbal part 
of the test. This suggests that language factors play an increasingly 
important réle in the construction of a finger schema during the 
genetic stages. 

The connection between the finger schema and number concept 
formation may be understood in this way. A number concept in its 
proper sense depends on the use of a number system, and the number 
system in turn rests upon the organization of a schema which shows a 
relationship between the numbers. Such a schema may be a spatial 
reference pattern in which the numbers pertain to each other; as 
in the series 1-2-3-—4—5—6—7—8—9-10 in which any number has a definite 
position with respect to all others. The fingers of the hand furnish 
the child with a ready-made instrument in which elements can be 
related to each other in order to form such a schema; and it is well 
known that the hand is actually the primary instrument used in 
formulating a number system. To be sure, deficiency in the finger 
schema, being only one of various factors attributing to arithmetic 
retardation, may or may not be present in any individual case of 
arithmetic disability." In general, however, children having a 
developmental impairment in the spatial relationships of fingers to a 
point where they cannot be properly used for a systematization of 
numbers are necessarily handicapped in the normal progress in 
arithmetic. 





264 


9 bo 


10. 


11. 


12. 


13. 


14, 


15. 





The Journal of Educational Psychology 


REFERENCES 


Gerstmann, J.: ‘““Fingeragnosie.”” Wien. klin. Woch., 1924, p. 1010. 

: “Ueber Fingeragnosie.” Jahrb. f. Psych., Vol. xiv, 1932, p. 135. 

Gordon, H.: “‘Hand and ear tests.” Brit. Journ. of Psychol., Vol. x111, 1922- 
1923, p. 283. 

Guilford, J.: Psychometric methods. McGraw-Hill Book Company, Inc., 

New York, 1936, pp. 566, 55. 

Hegge, T., Sears, R., and Kirk, S.: ‘‘ Reading cases in an institution for men- 
tally-retarded problem children.” Proc. Am. Assn. Stud. Feebleminded, 
56th Session, 1932, p. 149. 

Lindquist, E.: Statistical analysis in educational research. Houghton Mifflin 
Company, Boston, 1940, pp. 266, 57. 

Monroe, M.: Children who cannot read. The University of Chicago Press, 
Chicago, April, 1932, pp. 205, 14. 

Piaget, J.: Judgment and reasoning in the child. Harcourt, Brace and Com- 
pany, New York, 1928 (Fr. ed., 1924), pp. 260, 107. 

Schilder, P.: The image and appearance of the human body. Psyche Mono- 
graphs, No. 4, 1935. 

: “Localization of the body image (postural model of the body).” 

Proc. Ass. Res. Nerv. Ment. Dis., Vol. x11, 1934, p. 466. — 

: “Fingeragnosie, Fingerapraxie, Fingeraphasie.””’ Nervenarzt, Vol. tv, 
1931, p. 625. 

Strauss, A., and Werner, H.: “‘ Deficiency in the finger schema in relation to 
arithmetic disability (fingeragnosia and acalculia).”” Amer. J. Orthopsych., 
Vol. vin, 1938, p. 719. 

: “Finger agnosia in children.” Am. J. Psychiat., March, No. 5, 














1939, p. 95. 

Werner, H.: Comparative psychology of mental development. Harper and 
Brothers, New York, 1940, pp. 510, 172ff. 

Werner, H., and Strauss, A.: ‘“‘ Problems and methods of functional analysis in 
mentally deficient children.” J. Abn. Soc. Psychol., Vol. xxxtv, 1939, p. 37. 











INTELLIGENCE TESTING OF PARTIALLY-SIGHTED 
CHILDREN!’ 


R. PINTNER 


Teachers College, Columbia University 


There are very few reports of the intelligence of children in sight- 
conservation classes. Those that are known to this author seem to 
have used the ordinary standard tests, whether group or individual, as 
prepared for children of normal vision. [The question at once arises 
as to whether the printed matter, whaler pictures or letterpress, 
suitable for normal vision, may or may not handicap the child with less 
than normal vision. } To forestall any such possible handicap all the 
visual material of the New Revised Stanford-Binet Tests of Intelligence 
was enlarged about one and one-half times by a photographic process.’ 
For the standard beads used in the various “bead chain”’ tests, large 
beads about twice the size of the standard beads and a thicker cord were 
substituted. The one inch cubes were not enlarged. No attempt was 
made to enlarge the small objects which occur in tests below the five- 
year level, as these low levels were hardly ever needed with the popula- 
tion tested. This enlarged material was shown to several supervisors 
and teachers of partially-sighted children, and was declared to be very 
suitable for such children, whereas several items of the standard 
material were judged to be very unsuitable for children with below- 
normal vision. The Binet test with this enlarged material, this 
Enlarged Binet, was then given to all ten-, eleven-, and twelve-year-olds 
in as many sight-saving classes as could be covered. A few children 
above and below these ages were also tested by special request of the 
teachers. In addition, a few cases were given the Standard Binet, i.e. 
without enlarged visual material. 


THE TOTAL GROUP 


Table I shows the IQ distribution according to age of the cases 
tested. The medians for the various age groups range from eighty-one 
to ninety-seven. The total of six hundred two cases tested by the 





1 Some help in gathering this material was rendered by a W.P.A. project under 
the direction of the author, but the project was closed before the original plan could 
be completed. The writer wishes to acknowledge this help given by Project No. 
65-1-97-21 W.P. 10 of the Works Progress Administration. 

? Permission to do this was generously granted by Professor L. M. Terman 
and by the publishers, The Houghton Mifflin Company. 

265 








266 The Journal of Educational Psychology 





Enlarged Binet shows a median of ninety-three with a total range from 
forty-four to one hundred sixty-six. 


number of cases in the interval from seventy to eighty-nine. 


The distribution shows a large 


Only 


twenty-seven cases were given the Standard Binet alone but 


TABLE I.—INTELLIGENCE QUOTIENTS ACCORDING TO AGE 



































Enlarged Binet. CA 
Stand- 
ard. C 
IQ Below) ig | a1 | 12 | ADVE pote | alt ha 
10 12 

130 and above............ 1 9 6 4 20 l 
Ro t4ck an euPs oe 1 44 24 14 1 84 4 
Ayo iit gi wedbclen ava lale 9 108 63 67 2} 249 9 
kb bea be eee tak 5 70 65 57 7| 204 12 
a ee 1 9 17 17 1 45 1 
SE. ith sen wa so o6b2 oes 17; 240; 175) 159 11; 602 27 
SE, ee oes 78 85 80 80 77 82 82 
SR dich 24s wiwees ces 92 97 92 90 81 93 90 
DN. 56 Gels Kk ae 102} 107; 102 99 83, 104 99 
NS aad ed one geese 55-130| 60-157 44-166/55-146)67-121/44-166 67-136 





the IQ distribution of these cases resembles the distribution of the 
larger sample very closely. The three age groups, ten to twelve inclu- 
sive, contain the best sampling of cases. The steady decline in IQ 
from age ten to twelve may indicate a selective factor in that the more 
intelligent child with minor visual defect may be more frequently 
returned to a normal class than the less intelligent child with similar 


visual defect. 


RE-TESTS 


Table II shows the results in terms of IQ of various groups given 
re-tests at various intervals of time. For example, the first two col- 
umns in the table show that there were ninety-eight cases who were 
tested first by means of the Standard Binet and then re-tested by means 
of the Enlarged Binet after an interval of approximately one month. 
There is a slight increase in IQ as shown by the median and the quar- 
tiles. The arithmetic mean of the individual gains and loses is +3.7 


1Q points. 
The third and fourth columns show that only nineteen cases were 


re-tested by the Standard Binet after a first test by the Enlarged Binet 








Intelligence Testing of Partially-sighted Children 267 


TaBLeE II.—Tue Resvtts or RE-TEsTs 





About one About one About six About one 
month month months year 
Interval. 
aang of Stand-| En- En- | Stand-; En- En- En- | Stand- 
ves ard | larged | larged | ard | larged | larged | larged | ard 
first | second! first | second)! first | second! first | second 


























ee ee s2 | s7 | 85 | 86] 81 so | 78 83 
Median...... 94 97 92 94 96 Y4 90.5 91 
a 105 | 108 | 99 | 99 | 109 | 112 | 102 | 105 
N..........., 98 | 98 19 19 | 6 | 65 | 57 | 57 
Average gain 











or loss..... | +3.7 +1.4 —0.6 +2.5 





after an interval of approximately one month. There is a slight but 
insignificant gain in IQ for this group in spite of the fact that the Stand- 
ard Binet should, theoretically, present a less favorable testing situation 
for visually-handicapped children. 

The fifth and six columns show re-tests with the Enlarged Binet 
about six months apart. Here we find practically no change for the 
group asawhole. The last two columns show the results for a group of 
cases tested first with the Enlarged Binet and then re-tested about one 
year later with the Standard Binet. A very slight and insignificant 
gain is shown. 

Looking at the results of these re-tests as a whole, we can see no 
clear-cut evidence that the Standard Binet handicaps the average child 
in our sight-conservation classes. He seems to do about as well on the 
Standard Binet as on the Enlarged Binet. 


AMOUNT OF VISION 


For five hundred eighteen out of the six hundred two cases, usable 
records showing the amount of visual acuity possessed by each child 
were available. Most of these records were fairly recent, but some 
were a few years old. Assuming that these records are fairly accurate, 
at least for the comparison of groups, the cases have been divided into 
three broad groups. The main group consists of those having vision 
between 20/70 and 20/200 in the better eye after correction. This is 
the group generally considered ‘‘partially-sighted”’ in so far as visual 
acuity alone is concerned. The other two groups flank this large cen- 








268 The Journal of Educational Psychology 


TaBLeE II].—PERcENTAGE DISTRIBUTION OF IQ’s ON ENLARGED BINET AccoRDING 
TO AMOUNT OF VISION 




















Poorer Better | , 
IQ than | Oooo | than | oor | All cases 
20/200 20/70 | | 
= | —__—______— = — ‘ 

130 and above....... 0 4s | 23 | 1.2 3.3 
eae 15.4 12.6 | 15.9 13.1 13.9 
ey a 30.8 39.3 | 44.9 40.5 41.3 
THOS............004 8 33.3 | 32.2 39.3 33.9 
Below 70............ 15.4 98 | 4.6 6.0 7.5 
Total cases.......... 13 285 | 220 | $4 602 
Average IQ.......... 91.6 95.1 | 96.0 | 98.2 | 95.1 
ee lo 18.2 19.9 | 16.1 16.2 18.1 











tral group, and consist of those having better and those having poorer 
vision than our main group. 

The facts about the intelligence of these groups are shown in 
Table III. It will be noted that there are very few cases with vision 
poorer than 20/200, and this is to be expected in sight-conservation 
classes. On the other hand there is a considerable number of cases with 
vision better than 20/70. This group comprises 42.5 per cent of the 
five hundred eighteen cases with visual acuity records. This seems a 
slightly larger proportion of such cases than is usually found in sight- 
conservation classes. 

The poorest vision group shows the lowest mean IQ. There is 
practically no difference between the mean IQ’s for the two other vision 
groups. The intermediate vision group (from 20/70 to 20/200) shows 
the highest percentage of high IQ cases, z.e. above 130 IQ. Whether 
there is any significance in this, it is difficult to say. It may be that 
children with high IQ’s and with vision better than 20/70 but poorer 
than normal are in general getting on so well in normal classes that they 
are not likely to be assigned to sight-conservation classes. 

A study of the individual gains or losses on such re-tests as were 
available has been made. If the Standard Binet tends to handicap 
the partially-sighted child because of visual factors entering into the 
test, it should handicap those with greater visual defect more than 
those with lesser visual defect. For the thirteen cases with vision 
poorer than 20/200, no re-tests were available. Our only comparison, 
therefore, is between the two larger groups; namely, those with vision 











Intelligence Testing of Partially-sighted Children 269 


between 20/70 and 20/200, and those with vision better than 20/70. 
With reference to the former group, we have thirty-six cases tested with 
the Enlarged Binet and re-tested with the Standard Binet about one 
year later. This group showed a median gain of +3.51Q points. In 
spite of their visual defect, they scored higher on the Standard Binet 
one year later. With reference to the latter group, (better than 20/70), 
we have twenty-four similar re-tests with a median gain of +2.5 IQ 
points. The superior vision of this group did not produce a higher 
median gain when re-tested on the Standard Binet. 

Again we have fifty-two cases of the 20/70 to 20/200 group showing 
a median gain of +4 IQ points from the Standard to the Enlarged Binet 
after an interval of about six months. Exactly the same median gain 
of +4 IQ points is made by thirty-three cases of the better than 20/70 
vision group for analogous re-tests. In other words, the re-test data 
available for these two vision groups give no evidence of any handicap 
suffered by these groups when tested by the Standard Binet as con- 
trasted with the Enlarged Binet. 


PREVIOUS IQ’8 


On the school records of one hundred eighty-four out of our six 
hundred two cases were found IQ’s given previous to the present study. 
The records sometimes noted the test used as well as the date of the 
testing. Some of the IQ’s were four or five years old, but some were 
quite recent. Being fully cognisant of the errors entering into such 
data, they have, nevertheless, been utilized for a comparison of means 
and sigmas as shown in Table IV. 


TaBLe IV.—CoMPARISON OF ENLARGED Binet [Q’s witH Previous IQ’s 
OBTAINED BY OTHER STANDARD TESTS 























Average IQ Standard deviations 

Previous test . Enlarged | Previous | Enlarged | Previous 
Binet test Binet test 
is ire, wares oes 120 96.2 90.2 18.6 15.1 
Standard Binet........... | 42 96.2 95.5 20.3 14.6 
Not specified............. 22, 100.9 91.4 19.9 13.7 
ES kan ved ee wee | 184 96.8 91.6 19.2 15.0 








Table IV shows that one hundred twenty IQ’s had been 
derived from standard group intelligence tests. The mean IQ 








270 The Journal of Educational Psychology 


for these one hundred twenty children on these previously adminis- 
tered group tests is 90.2, as contrasted with a mean of 96.2 for these 
same children when tested on the Enlarged Binet for the present 
study. When the previous test was the Standard Binet, not given in 
connection with this study, we have a mean IQ of 95.5 as contrasted 
with 96.2 for the Enlarged Binet for forty-two cases; and again we 
note no substantial increase in IQ as a result of taking the Enlarged 
Binet. The twenty-two cases where the intelligence test was not 
specified show the greatest difference between the previous IQ and 
the present IQ derived from the Enlarged Binet, but, in view of the 
other comparisons, it is not safe to attribute this difference to the 
Enlarged Binet. Taking all the one hundred eighty-four previous 
1Q’s, we find a mean gain of only 5 IQ points in favor of the Enlarged 
Binet. It is surprising that these partially-sighted children have 
been seemingly penalized so little in all the previous tests given them, 
remembering that these tests included many types of group tests not at 
all adapted for children with visual defects, and remembering also that 
most of these tests were undoubtedly given by examiners who were 
probably not aware of any special visual defect on the part of those 
they were testing. 


INFLUENCE OF BILINGUALISM 
From our various tables, we note that the mean IQ of our total 
group is 95.1, and we may ask the question as to whether this mean is 


TaBLE V.—PERCENTAGE DisTRIBUTION OF IQ’s ON ENLARGED BinET ACCORDING 
TO LANGUAGE SPOKEN AT HOME 























English : 

English Foreign and No infor- All 
foreign mation cases 
130 and above....... 4.1 1.6 5.7 2.2 3.3 
I when dw aeae'e 15.1 10.4 20.0 12.2 13.9 
ER dean ne eerie 46.3 34.5 40.0 33.6 41.3 
FO-O0............... 28.7 44.0 | 31.4 39.8 33.9 
Below 70............ 5.8 9.6 2.9 12.2 7.5 

Total cases...... 344 125 35 98 602 
Average IQ.......... 97.5 90.6 98.6 91.0 95.1 
Repl epaniceaEy yee 18.1 16.9 20.4 17.2 18.1 














Intelligence Testing of Partially-sighted Children 271 


depressed five points below the norm of 100 because of visual defect 
alone, or whether it is so depressed because of additional factors enter- 
ing into the lives of these particular children. There are many possible 
factors that might depress the IQ for such a group. The records 
of these children showed one factor that has been repeatedly shown to 
have an influence upon the IQ derived from a verbal intelligence test. 
For all but ninety-eight of the cases, information as to the language or 
languages spoken at home was available. 

Table V shows the percentage distribution of IQ’s for the three 
language groups. The two main groups are the English and Foreign. 
For the English group we have a mean IQ of 97.5, and for the Foreign 
a mean of 90.6. In all probability the mean IQ of the Foreign group 
is somewhat affected by the bilingual factor. It is interesting to note 
that the few cases reporting a foreign language as well as English show 
a mean IQ about the same as for the English group. Very likely these 
are families in which the main language for the children is English. 


SUMMARY AND CONCLUSIONS 


A group of six hundred two children in sight-conservation classes 
was tested by the New Revised Stanford-Binet Test. The group 
included almost all the ten-, eleven-, and twelve-year-olds in the classes 
examined, with the addition of a few cases outside of this age range. 
The visual material of the test was enlarged so as to avoid any possible 
handicap due to visual defect on the part of the child. Numerous 
re-tests at varying intervals were given, some with the standard mate- 
rial, and some with the enlarged. For the group as a whole, no obvious 
handicap was apparent when these children were given the standard 
visual material, although it is likely that some cases with severe visual 
defect were handicapped. 

The mean IQ for the six hundred two cases tested with the Enlarged 
Binet was 95.1. When divided into groups according to amount of 
vision, no difference in means was found between those having better 
than 20/70 vision and those whose vision was from 20/70 to 20/200. 
A very small group with vision poorer than 20/200 showed a lower 
mean IQ. Some re-test data with Standard and Enlarged Binets gave 
no sign that the 20/70 to 20/200 group was more handicapped in taking 
the Standard Binet than was the other group with vision better than 
20/70. 

A study of previous IQ’s obtained from many different kinds of 
tests dating back for varying lengths of time and given by many differ- 








272 The Journal of Educational Psychology 


ent examiners showed on the whole lower IQ’s than those obtained by 
the Enlarged Binet. The suggestion here is that these partially-sighted 
children are probably handicapped in taking the usual standard group 
intelligence tests. 

Although the mean IQ for the total group is 95.1, the fact that 
about twenty-one per cent were known to come from bilingual homes 
would lead to the conclusion that this mean IQ is also affected by bilin- 
gualism. In all probability the mean IQ for partially-sighted children 
in sight-conservation classes is only a few points below normal—some- 
where in the neighborhood of 96 or 97. It is interesting to note that 
this mean is about the same as that found for hard-of-hearing children. 

[Although no evidence has been found in this study to support the 
opinion that the standard visual material of the Binet Test is a definite 
handicap for children in our sight-conservation classes, there is always 
the danger, when testing these cases, that some particular individual 
child may be so handicapped. | Furthermore, teachers and supervisors 
do not like to see these childrén given ordinary print to read or given 
small pictures to look at. The writer, therefore, believes that the 
psychological examiner would do well to use enlarged visual material 
when testing these children. e use of such material will remove any 
doubt from his mind and from the minds of teachers and supervisors as 
to whether the child has been given a fair chance. 








A DIAGNOSTIC SPELLING SCALE FOR THE COLLEGE 
LEVEL: ITS CONSTRUCTION AND USE 


THELMA G. ALPER 
Wellesley College 


INTRODUCTION 


Remedial spelling has lagged far behind in the development of 
remedial education. Yet the poor speller is almost more frequent 
than the poor reader both at the grade school and at the college 
levels. 5:17-18.27,29,80,49.49 Many poor spellers disguise their handicaps 
satisfactorily. Others bear less successfully the hurts of teacher 
condemnation and of futile hours of rote memory drill and often suffer 
serious personality changes. The writer has encountered a number of 
these students at the college level. Their problem is a real one and 
their need for remedial measures is pressing. 

A satisfactory remedial program must include adequate diagnostic 
tests and suitable remedial techniques. Both have been lacking in the 
field of spelling at the college level. Buros’ lists a number of high- 
school spelling tests which, presumably, could be standardized for the 
college level. However, none of these tests can serve as an adequate 
diagnostic tool since none provide a framework for analyzing and clas- 
sifying spelling errors. The situation with regard to remedial tech- 
niques has been no better. The literature is a confused mass of pros 
and cons on such topics as the value of spelling rules, the presence and 
treatment of hard spots, the use of isolated word-drill. 

The present research was undertaken at Wellesley College to study 
some of the practical problems involved. In this paper, only three 
aspects of that research will be considered: (1) The construction of a 
spelling test for the college level based on the most frequently mis- 
spelled words in the active written vocabularies of college students; (2) 
the diagnostic analysis and classification of the misspellings of college 
students; and (3) the development of specific remedial techniques 
adapted to the specific error-categories. 


PART I: CONSTRUCTION OF THE WELLESLEY SPELLING SCALE 


A. Principles of Construction 


A test designed for remedial purposes is perhaps less concerned with 
the total distribution of the function which it purports to measure than 


273 





al 





274 The Journal of Educational Psychology 


with its ability to select the students most in need of remedial instruc- 
tion. For this reason the present test was constructed on the’ basis of 
the most frequently misspelled words in the active written vocabularies 
of college freshmen rather than on the basis of the most frequently used 
words. Margaret Mitchell, Wellesley College, 1940, working under 
the direction of Dr. Edith B. Mallory and the writer, checked and 
tabulated the spelling errors appearing in over five thousand themes 
written during the academic year 1939-1940 for the required freshman 
course in English Composition at Wellesley. Record was kept of the 
frequency of each misspelled word, the form of the misspelling and the 
number of students who made each error. 

The themes yielded a total of 1340 misspelled words. These words 
were classified into 501 word-families, i.e., base words and their deriva- 
tives, and arranged into five groups as shown in Table I. Group | 
includes all words of which either the base form or a derivative was 
misspelled by five or more freshmen; Group II, word-families misspelled 
by four freshmen; Group III, word-families misspelled by three fresh- 
men; etc. 


TABLE I.—DIsTRIBUTION OF THE 1340 Worps MIssPELLED BY WELLESLEY COLLEGE 
FRESHMEN ON REQUIRED ENGLISH THEMES WRITTEN DURING 1939-1940 











aniline extinsinel t Word- | Total number of | Per cent of total 
” y families | different words misspellings 
5 or more students.............. 33 55 4.1 
EE aa SF ap ae ety 42 107 8.0 
SERRE ge rene ee eae 68 120 8.9 
eo Ok a he a eee 78 778 58.1 
ES ou sinh a dace eee beweek 280 280 20.9 
EE ae, Ae 501 1340 100.0 














One of the most significant data of Table I is the fact that only 
fifty-five different words were included in the thirty-three word-families 
in the Group I words. These words were apparently quite uniformly 
difficult. On the other hand, the fact that only 4.1 per cent of all 
misspellings were common to five or more students raises the problem of 
whether a valid test for the college level can be constructed from these 
data. 

The conditions under which these word-families were misspelled 
seem to give them special significance. In the first place, student 
themes are written not under classroom conditions but in the dormi- 








Co 8 — Dee Se OS = | 


es 





A Diagnostic Spelling Scale i 


tory or library where dictionary reference is both permissible and 
encouraged. In the second place, students are unlikely to include in 
their themes words they know they misspell unless they can refer to a 
dictionary or other source for help. In the third place, students 
frequently proofread each others’ papers since students are aware that 
spelling errors may affect the ultimate grading of the paper. In the 
fourth place, statistically, errors in Group I and Group II are easily 
within the one per cent level of significance. It would seem, therefore, 
that a valid college spelling test could be compiled from them. 

Following the procedure of Anderson,? the thirty-three word-fami- 
lies misspelled by five or more students and the seventeen word-families 
misspelled by at least four students were used for a preliminary fifty- 
word spelling scale. In each word-family, the word-form most fre- 
quently misspelled in student themes was selected as the test word. 
These fifty words were then arranged in chance order. The modified 
sentence dictation form of test was chosen for presenting the test words 
to the students, since this form had been found most satisfactory by 
earlier investigators.*::!7-33.34.35 Tn this form of test, the tester pro- 
nounces the word, then reads a simple sentence in which the word 
occurs and, finally, repeats the word itself. The students are required 
to write only the test word. 

The fifty-word test, under the title of the Wellesley Spelling Scale,* 
was administered to three hundred fourteen freshmen during regular 
English Composition class periods. At the same time, the thirty words 
of the Progressive Achievement Test, Advanced Battery, Form A,* were 
given for comparison purposes.f Both tests were also given to ninety- 
five upperclassmen with whom the tests and test results were discussed. 
The upperclassmen were in universal agreement that all the words on 
the Wellesley Scale were in their active spoken and written vocabularies, 
that college students should be able to spell these words; that none was 
unusual or unfair to include in a college spelling test. 


B. Results Obtained by Use of the Test 


1. Distribution of Scores —A summary of the distribution of the 
freshman scores on both the Wellesley Scale and the Progressive Test 
is presented in Table II. The results are given in terms of the per- 





* The Wellesley Spellixg Scale will be referred to in subsequent sections of this 
paper as the Wellesley Scale. 

t The Progressive Achievement Test will be referred to in subsequent sections of 
this paper as the Progressive Test. The results on the Progressive Test will be 
included in the tables which follow but will not be fully discussed in the text. 


—— 





276 The Journal of Educational Psychology 


centage of students who obtained a score of at least a given per cent 
correct. Thus, Table II reads: 0.6 per cent of the three hundred 
fourteen freshmen made a score of 100 per cent correct on the Wellesley 
Scale; 17.2 per cent made a hore of 100 per cent correct on the Pro- 
gressive Test, etc. Itis apparent that the Wellesley Scale was the more 
difficult test, yet the negative skewness of the distributions, as shown in 
Table II, suggests that both tests were too easy for the population 
tested. This is a favorable aspect of the Wellesley Scale for we desire 
here not so much a measure of the distribution of spelling ability 
as an instrument which will aid in the selection and study of poor 
spellers. On the other hand, it is worthy of note that only three out 


TaBLeE II 
Distribution of the 314 Freshmen scores on The Wellesley Spelling Scale and 
the Progressive Achievement Test, Form A, expressed in terms of the possible per 
cent score on the test and the per cent of students who attained or exceeded a 


given per cent score. 











, Wellesley Scale Progressive Test 
Per cent right on test Per cent of students | Per cent of students 
100 .6 17.2 
90 39.8 70.1 
80 76.8 91.7 
70 92.7 100.0 
60 99.4 
50 100.0 
Average score in per cent............ 84% 90% 
Median score in per cent............. 86% 93.3% 
Standard deviation.................. 4.3 1.92 
GEE eae —1.33 —0.70 











* Computed by Kelley’s** formula: sk —— P50. 


of the four hundred nine students (two freshmen and one junior) made 
perfect scores on the Wellesley Scale. The freshman scores on the 
Wellesley Scale in terms of per cent correct ranged from 56 per cent to 
100 per cent with a mean of 84 percent. The scores on the Progressive 
Test ranged from 73.3 per cent to 100 per cent with a mean of 90 per 
cent. In general, the quantitative and qualitative differences between 
the freshmen and the upperclassmen were negligible. (Cf. Hartmann” 
for similar findings.) Only the freshman data are presented at this 


time. 








de 
he 


ve 
er 
en 


his 





A Diagnostic Spelling Scale 277 


TasB_e III 
Comparative difficulty of the test words as shown by the per cent of students 
misspelling each rank order range of difficulty. Each rank order range includes 
five words. 














Rank order range of Wellesley Scale Progressive Test 
difficulty Per cent of students Per cent of students 

1 44.7 27.6 
2 31.7 15.0 
3 22.9 6.0 
4 17.3 3.7 
5 13.4 .8 
6 11.5 2 
7 9.4 
s 7.1 
9 2.9 

10 1.0 











2. Comparative Difficulty of the Test Words.—The suitableness of the 
Wellesley Scale for the college level is clearly evidenced by a survey of 
the difficulty of the individual words asshownin Table III. This table 
reads: The five most difficult words on the Wellesley Scale (rank order 
1) were misspelled by 44.7 per cent of the freshmen; the five least diffi- 
eult words (rank order 10) by 1 per cent of the freshmen, etc. The 
most commonly misspelled word, ‘‘mantle,’’ was misspelled by 57 per 
cent of the freshmen. Only one word, ‘‘judgment,’’ was misspelled 
by no one. * 

The rank order range of difficulty on the Progressive Test was con- 
siderably smaller than on the Wellesley Test. No errors were made on 
13.3 per cent of the words and the most difficult word, ‘millinery,’ 
was misspelled by only 35 per cent of the freshmen. 

3. Analysis of the Spelling Errors—A detailed analysis of the 
difficulty of each of the words and of the type of errors found will be 
presented in the Manual of Directions which accompanies the revised 
form of The Wellesley Spelling Scale.t Two facts, however, are 
of special interest here because of their significance for remedial 
instruction: 





* No errors were recorded in the test results for the word “judgment”’ since 
the only other spelling found (“judgement”) was accepted as correct on the 
authority of Merriam’s 1938 edition of Webster’s New International Dictionary, 
Unabridged. The word had been included in the scale originally because the 
English instructors were refusing “judgement” on themes. 

t The scale is now available in two alternate fifty-word forms and has been 
standardized for both senior high school and college use. 





278 The Journal of Educational Psychology 


(a) Presence of ‘hard spots’’:* A survey of the position of errors 
occurring with a frequency of 50 per cent or more indisputably 
establishes the presence of ‘‘hard spots” in these words. Ninety per 
cent of the fifty Wellesley Scale words had one hard spot, 6 per cent had 
two hard spots and only 4 per cent showed no consistent form of 
misspelling. 

(b) Forms of Misspelling: The average number of different mis- 
spellings per word on the Wellesley Scale was 3.5. The range of the 
number of different misspellings was 0 to 18. 

Thus, whereas the “‘hard spot’ data suggest uniformity of diffi- 
culty, the number of different misspellings per word would lead us to 
assume a rather wide variety of causes for the misspellings. 


PART II. CLASSIFICATION OF SPELLING ERRORS 


A large number ot investigators have been interested in the problem 
of classification of spelling errors (cf. Foran'? and Spache*?:** for a 
review of the literature), but to date the classifications have contributed 
very little to the diagnosis of spelling errors. There are two main types 
of classification: Objective categories and subjective categories. 
Objective categories consist of such errors as the omission of letters, the 
substitution of letters, the transposition of letters, the failure to double 
a consonant. Subjective categories consist of errors of incomplete 
transfer, phonetic misspellings, mispronunciations, homonym confu- 
sions. Objective categories are relatively easy to set up but their 
value is limited. Subjective categories are very difficult to set up 
since operational definitions of subjective categories are difficult to 
formulate. Subjective categories, however, are essential at the college 
level if one is to establish an effective remedial program: Both the 
“what” and the “why” must be considered. The problem thus 
becomes one of definition of error-types such that different classifiers 
could arrive at identical classifications. 

In the present study spelling errors were arbitrarily classified with a 
view toward using the Wellesley Scale as a diagnostic determinant of the 
spelling difficulties of college students and, accordingly, a subjective 
classification which clearly suggests the cause of the error was under- 
taken. Analysis of the data, as shown in Table IV, disclosed the 
presence of seven categories of error: (1) Phonetic misspellings; (2) 





* Tireman (47, pp. 13) defines a “hard spot”’ as “‘any part of, or place in, the 
word which is misspelled by 40 per cent or more of those who misspell the word.” 
Other investigators (1, 12, 13, 18, 23, 31, 39, 46, 47, 49) more generally accept 
a 50 per cent frequency and this latter percentage was used in the present study. 








1€ 


1€ 
ve 
T- 
ne 


he 


c 





A Diagnostic Spelling Scale 279 


distorted phonetic misspellings; (3) misspellings which might have 
been avoided by the proper use of a spelling rule; (4) positive transfer 
with negative transfer effect (51) resulting in incomplete detachment 
from a root word, or in confusions with a foreign language word; (5) 
homonym errors; (6) tension errors involving reversals, anticipations, 
additions of letters, etc., at the hard spot of the word; and (7) errors of 
word-substitution and of word-form.* The categories are abbreviated 
in Table IV under the respective labels: Phonetic, Distorted Phonetic, 


TaBLe IV 
Summary of percentages of error made by 314 Wellesley College Freshmen on 
the Wellesley Spelling Scale and the Progressive Achievement Test, arranged accord- 
ing to Type of Error. 





Most frequent 











Remaining errors Total errors 
errors 
Welles- Pro- Welles- Pro- Welles- Pro- 
ley gressive ley gressive ley gressive 

Fee ae 59.6 37.8 34.8 26.4 47.5 
Distorted Phonetic....| 42.5 13.3 41.2 36.5 42.3 24.6 
ee ek cee 15.7 8.4 5.1 6.7 14.2 7.6 
Negative Transfer... .. 3.1 11.9 1.4 5 2.8 6.4 

ER a Se OS ee eee eee 12.2 
Tension at Hard Spot..| ..... 6.6 4.8 17.5 e 11.9 
Substitution.......... + 2 9.° 4.0 1.4 2.0 
ee oe a 100.0 100.0 100.0 100.0 100.0 100.0 























Rules, Negative Transfer Effect, Homonym, Tension-at-Hard-Spot 
and Substitution. Operational definitions of each of these categories 
are offered below: 

(1) Phonetic Misspellings—Phonetic misspellings are misspellings 
which involve either the omission of a silent Jetter (or letters) or the 
substitution of a letter (or Jetters) which has the same speech sound as 
the correct letter (or letters) which it replaces. In other words, 
phonetic misspellings are those in which the speller has allowed him- 
self, consciously or unconsciously, to be misguided by the sound of the 
word. The misspelling of “rhythm” as “rythm,” “rythem” or 
“rhythem,” of “exhilarating” as ‘“exilarating,”’ of “ Britannica” as 





* Although the first five categories could be subsumed under the general heading 
“Errors of Transfer’ there is every advantage in separating them as has been 
done in the text. 





280 The Journal of Educational Psychology 


“‘Brittannica,” “‘ Britanica” or “‘ Brittanica”’ are samples of phonetic 
errors. 

Phonetic errors represent the second largest category of freshmen 
errors on the Wellesley Scale and the largest category on the Progressive 
Test. ‘Twenty-six and six-tenths per cent of the total misspellings and 
24.5 per cent of the most frequent errors on the Wellesley Scale were 
phonetic misspellings. (Table IV.) 

Earlier studies*-*:*! report a high percentage of phonetic misspell- 
ings at all grades and an increasing tendency to spell “by sound” 
through the school grades. This tendency reaches its peak, appar- 
ently, among those college students who lack efficient spelling rules or 
guides, as the studies of Carroll* and Gates and Chase!’ might have led 
us to predict. 

(2) Distorted Phonetic Misspellings.—Distorted phonetic misspell- 
ings may be defined as those misspellings which involve a distortion of 
the pronunciation of the word and a phonetic rendition of the resultant 
distortion. Examples of such errors are “ignorent” for “ignorant,” 
“tolerence” for ‘tolerance,’ ‘“exhilerating” for ‘‘exhilarating,” 
“‘affect”’ for “effect.””"* The cause of the mispronunciation is not 
necessarily the same from student to student. For example, some 
students may be misguided by the pronunciation of the speaker, by 
local speech peculiarities, by their own careless enunciation of words, or 
by lack of familiarity with, or rule for, the correct spelling. On the 
other hand, habitual misspellings may result in habitual mispronuncia- 
tions which serve to strengthen the misspelling. Whatever the basis 
for such errors, students not infrequently report that they test their 
spelling by the way in which they pronounce the word and pronounce 
the word by the way in which they spell it. Wherever the fault lies, 
the prevalence of these errors among college students is serious both as 
a matter of good diction and of good writing for distorted phonetic 
errors constitute the largest category of error on the Wellesley Scale and 
the second largest on the Progressive Test. Forty-two and three-tenths 
per cent of all errors and 42.5 per cent of the most frequent errors on 
the Wellesley Scale fall into this category, as do also 24.6 per cent of all 
errors and 13.3 per cent of the most frequent errors on the Progressive 





* The confusion between “affect” and “effect’’ may be more closely related to 
homonym errors and errors of confusion in meaning than to the present class of 
errors. However, the tendency to ‘‘try them out,’ to sound “affect,” then 
“effect’”’ and to determine which should be used in a given case in terms of which 
sounds better, suggests that such errors should be grouped under distorted phonetic 
misspellings rather than under homonym and confusion of meaning errors. 








tic 
en 
wwe 
nd 
are 


s|]- 
| ” 


ull 


to 


en 
ch 
Lic 





A Diagnostic Spelling Scale 281 


Test. Phonetic errors are more likely to occur in the prefix of a word, 
distorted phonetic errors in the suffix. 

(3) Rule Errors.—Rule errors are misspellings which might have 
been avoided by the proper use of (1) a spelling rule (e.g. ‘7 before e, 
except after c, etc.”’) or (2) a pronunciation guide (e.g. ‘“‘c and g soft 
before zt, e and y”’). Examples of such errors are “‘lonliness” for 
‘“‘loneliness,’’ ‘developped ”’ for “‘developed,” “ mischeivous’”’ for “‘ mis- 
chievous.”” The misapplications of a rule to an “exception to the rule,” 
e.g., “weird” spelled “ wierd” and “argument,” “‘arguement,”’ are also 
regarded as “‘rule”’ errors. 

Rule errors form the third largest category of errors on the Wellesley 
Scale both with regard to the most frequent errors, 15.7 per cent, and 
the total errors, 14.2 per cent. On the Progressive Test 7.6 per cent of 
all errors were “‘rule”’ errors. 

The percentage of “‘rule” errors found in the present study is 
dependent, of course, upon the number of words in the test to which 
rules apply. Twelve words out of the fifty on the Wellesley Scale were 
words to which our criterion of rule errors could apply. Rule errors 
were made on all twelve words and in all but four words the most fre- 
quent form of misspelling clearly indicated lack of familiarity with the 
proper rule. Rule errors, apparently, do form a significant category of 
error at the college level in spite of the fact that experimental studies 
seem to negate the value of teaching rules. (Cf. Foran’ for a review 
of the literature to 1934 and Gates!* for later studies.) It is possible 
that negative results in experimental studies derive from the use of 
poorly formulated rules rather than from the value of rules per se. 
English spelling is very much more lawful than most spelling books 
suggest and should, therefore, lend itself to the postulation of useful 
rules. 

(4) Negative Transfer Effect Misspellings.—Negative transfer effect 
misspellings include all errors which (1) involve an incorrect carry-over 
into the English spelling of the spelling of a foreign word or (2) the 
incorrect carry-over of the spelling of the root or base of the word. 
Examples of these errors are “‘ressemblance’”’ from the French “res- 
semblance,” “curiousity” from “curious” “fourty”’ from “four,” 
“absorbtion” from “absorb.” 

Negative transfer errors constitute 2.8 per cent of all errors and 3.1 
per cent of the most frequent errors on the Wellesley Scale and 6.4 per 
cent of all errors on the Progressive Test. 

(5) Homonym Errors.—Homonym errors are misspellings which 
involve the substitution of a word identical in sound but different in 





282 The Journal of Educational Psychology 


meaning and spelling. Strictly speaking, most homonym errors may 
be regarded as phonetic errors but they have been classified separately 
to aid in the diagnosis of individual cases. 

Homonym errors represent 12.2 per cent of the total errors and 14.2 
per cent of the most frequent errors on the Wellesley Scale. Only one 
form of misspelling occurred for the words in question; 7.e., the error 
consisted, for all students who misspelled the word, in substituting the 
homonym of the dictated word. Examples of errors in this category 
are ‘‘mantle”’ for “mantel,” “‘ principle” for “‘principal.’”’ There were 
no opportunities for homonym errors on the Progressive Test. 

(6) Tenston-at-hard-spot Errors.—Errors which involve reversals of 
letters, anticipations of letters, omissions of letters in the correct place 
and subsequent insertion of these letters in the wrong place are classified 
here as evidence of tension if such reversals, anticipations, omissions 
and additions occur at or in relation to the hard spot of the word. The 
writer feels justified in including such errors in an interpretative clas- 
sification since this category goes beyond the mere designation of 
“omitted letters,’’ e.g., and seeks the reason for the omission. Exam- 
ples of these errors are ‘‘analiyze”’ for “‘analyze,’”’ ‘‘mischevious”’ for 
“‘mischievous.’’* 

Students vary considerably in the way in which they manifest these 
tensions. ‘This is shown especially by the fact that no errors which 
constitute 50 per cent of the misspellings for the word in question, 
could be classified as tension-at-hard-spot errors. The degree to 
which the student is aware of hard spots, his habits of checking his 
papers for misspellings and his own peculiar psychological problems 
would give rise to marked differences in test results. 

Seven-tenths per cent of all Wellesley Scale errors and 11.9 per cent 
of all Progressive Test errors are classifiable as tension errors. 

(7) Substitution Errors.—These errors include (1) substitutions of a 
word from the sentence used to illustrate the meaning of the test word 
in place of the test word and (2) errors of word-form, e.g., dmissions of 
the prefix or suffix of the test word, or additions of a prefix or suffix to 
the test word. Examples of such errors in the present study were 
“water” for ‘‘too,” ‘‘occurring”’ for ‘‘occurred,”’ “prefer” for ‘‘pre- 
ferred.”” Although these errors represent a statistically insignificant 
category in the group data, they are often of great importance in 
individual diagnostic work. (Cf. Remedial Techniques, Part III, 





* If there is evidence in an individual case that a student is mispronouncing 
‘“‘mischievous”’ and is, therefore, misspelling it, the error should be regarded as a 
distorted pronunciation error. 








A Diagnostic Spelling Scale 283 


below.) Only 1.4 per cent of all errors on the Wellesley Scale and 2.0 
per cent of all errors on the Progressive Test are classified as substitu- 


tion errors. 


PART III: REMEDIAL PRINCIPLES AND TECHNIQUES 


The primary function of the Wellesley Scale is to provide a short, 
easily administered, diagnostic spelling scale for the high-school and 
college levels.* Use of the test at Wellesley College has indicated 
that spelling errors are classifiable into the seven categories outlined in 
Part II, and that specific remedial techniques may be recommended 
to deal with the major difficulties which these categories represent. 


1. Aids for Phonetic Spellers 


Phonetic spellers at the college level resort to phonetic spellings for 
a variety of reasons, yet most of the difficulties represented by these 
students yield to a fairly routine remedial procedure. The most 
valuable remedial procedure consists in a rapid survey of Greek and 
Latin prefixes and of common Latin stems, with special stress on word- 
family study.f It is unnecessary, of course, to undertake an exhaus- 
tive review of the basic languages from which English is derived, yet the 
frequency with which the Latin stems and affixes enter into our present- 
day English more than justifies the survey recommended here. The 
additional work with Greek prefixes not only crystallizes for the student 
the réle of prefixes in general, but also frequently stimulates an interest 





* For use in individual cases the test should be supplemented by reading tests, 
both oral and written; tests of sensory acuity, especially vision and hearing**:**.*7.4°; 
consideration of the student’s attitude toward his misspellings; knowledge of 
foreign languages; dictionary habits; emotional maturity and academic level. A 
number of excellent suggestions are found in Steadman,“ Foran,!* Breed,* Sud- 
weeks,*® Gates'* and Dunn.** Although these latter authors are concerned pri- 
marily with the testing of spelling below the college level, the basic principles are 
essentially the same. 

t Experimental evidence of the value of a knowledge of Latin for English 
spelling is largely negative. The writer has found, however, that the etymological 
approach to spelling is extremely useful and that it can be used advantageously 
even with students who have not previously studied Latin. It may be that 
earlier studies have yielded negative findings because their authors assumed 
that a knowledge of Latin would automatically transfer to English spelling if 
the Latin is an aid to spelling. But transfer cannot be relied upon to occur 
automatically; it must be a conscious, carefully fostered process. The relation- 
ships between English word-families and English affixes and Latin stems and 
affixes must be pointed out explicitly to the average student and conscious habits 
of transfer thus built up. 








284 The Journal of Educational Psychology 


in and feeling for the English language which sheer spelling drill could 
never accomplish. The particular stems to be included for drill should 
be determined largely in terms of the word-families the individual stu- 
dent misspells in his written work. 

The explicit stress on word-families is of special significance for the 
phonetic speller’s problem. Isolated word-lists are futile and laborious 
at any grade level; at the college level they completely defeat the stu- 
dent’s needs. The phonetic speller, for the most part, resorts to the 
phonetic spelling just because he has no other guide. What he must be 
given, therefore, is a technique which will aid him not only in the spell- 
ing of specific words which he is known to misspell, but also in the 
spelling of new words not drilled on in the course of the remedial work. 
Thus, the primary importance of the etymological approach suggested 
here is the confidence which “lawfulness” affords the phonetic speller— 
a knowledge of prefixes; for example, makes the doubling of the ‘‘n”’ in 
“annotate” and the absence of a double ‘‘n” in ‘‘analyze”’ no longer a 
matter for guesswork or rote memory; stress on word-families makes 
“‘unanalytic,” as simple as “analyze” or “‘analysis.”’ 

Another device aimed specifically at the correction of particular 
spelling demons consists in helping the phonetic speller to locate, when- 
ever possible, short, meaningful words at the hard spots of difficult 
words, as, for example, the ‘‘piece of pie” in “piece,” the ‘‘sin” in 
“business,” the “le” of ‘‘rule” in principle for which “rule” is syn- 
onymous, etc. This device must be used with caution with phonetic 
spellers, however, for the tendency to introduce meaningful words 
where none really exists is an appealing trap for these students. An 
overstress on syllabication is likewise probably disadvantageous for 
phonetic spellers. In general, visual imagery rather than auditory 
imagery should be encouraged. 

Further aid for phonetic spellers can be obtained from a careful 
study of pronunciation guides. The ‘‘e” in “noticeable” must be 
retained, for example, to keep the “‘c”’ soft; doubling the ‘‘ p”’ in “‘ hop- 
ing’’ would result in a word which no longer means “hoping.” Cf. 
Steadman,** pp. 172. Thus, the problem which the phonetic speller 
presents is certainly not as hopeless as suggested by Horn’s*! classic 
reference to the 977,738,065,920 different phonetic possibilities in the 
spelling of the word “circumference.” 


2. Aids for Distorted Pronunciation Errors 


As was indicated above, errors of distorted pronunciation are most 
frequent on suffixes of words. Drill on suffixes and on word-families is, 








A Diagnostic Spelling Scale 285 


therefore, especially helpful. For example, knowledge of the fact that 
English words taking the -ance in general take the -ancy, -ant, -ation and 
-able suffixes and also are, for the most part, derived from first conjuga- 
tion Latin verbs is an invaluable aid to these students. To a certain 
extent, drill of this sort is limited by the student’s knowledge of Latin, 
but stress on word-families broadens both the scope of applicability 
and interest in word-study. 

Particular attention should be paid with these students to their 
diction and pronunciation. Poor pronunciation may lead to misspell- 
ings based on the distorted pronunciation, or, on the other hand, the 
misspelling may be at the root of the distorted pronunciation. In 
either case, the end-product is the same and the vicious circle must be 
broken. Occasionally, the difficulty is related to an auditory defect, 
such that auditory recognition of the correct spelling cannot be an aid. 
In such cases, the remedial procedure outlined above is almost the only 
effective procedure. 

In the limited number of cases where the study of word derivations 
is not feasible either because of lack of time or lack of interest on the 
part of the student, the writer has resorted to listing for the student the 
most common -ance, -ant English words and then pointing out carefully 
the differences in meaning between words such as “dependent” and 
“dependant.”” This short-cut procedure is not to be recommended 
except in special cases since it depends almost exclusively on rote 
memory. However, use of noun suffixes -ation vs. -ition as in “‘toler- 
ation”’ and ‘‘audition”’ for determining the “‘a”’ families vs. the ‘“‘e”’ 
families (tolerable, tolerance, audible, audience) can supplement rote 
memory devices. 

Both auditory and visual stress on syllabication is important for 
these students, especially if emphasis is placed on the troublesome part 
of the word. Most students seem to be aided by underlining the hard 
part of the word, but the writer holds no particular brief for this 
practice. If, by chance, the underlining can be such as to involve a 
meaningful monosyllabic word within the longer “spelling demon,”’ the 
underlining may serve as an additional attention-compelling factor. 
If the underlining involves merely a nonsense syllable conformation, the 
underlining has little, if any, memorial advantage. 


3. Aids for Spelling Rule Errors 


As was indicated above, authorities differ widely in their advocacy 
of the use of rules. Yet the writer has found that at the college level 
rules are a welcome relief from the chaos of unguided spelling. Rules 





A 





286 The Journal of Educational Psychology 


are especially helpful when a deductive approach is employed both for 
the recognition of the rule and its application to families of words. * 

The following simple procedure for the teaching of rules is recom- 
mended: (1) Determine from the test results and recent written work 
which rules are being violated; (2) dictate to the student several words 
in common usage which illustrate one of these rules; (3) immediately 
correct the student’s spelling whenever necessary; (4) require the stu- 
dent to deduce the rule himself from the examples you have dictated to 
him; (5) list the major exceptions, if any; (6) give practice in the 
application of the rule and the exceptions to the rule.f 

Since pronunciation guides are included here under “ Rules,” the 
student should be given every aid that pronunciation can provide for 
spelling. Furthermore,-the use of rules can be successfully supple- 
mented by the seeking for meaningful words within the word to be 
spelled, as suggested above. For example, noting the “ pie” in “ piece”’ 
and the “ we” in ‘‘ weird” will aid in the memory of both the ‘‘i before 
e... ” ruleand an exception to it. 


4. Aids for Negative Transfer Effect Errors 


Certain types of negative transfer effect errors readily yield to the 
etymological approach outlined under Aids for Phonetic Errors. For 
example, the tendency to use the French spelling “‘ressemblance” for 
“‘resemblance” is avoided by the recognition that the English prefix 
is “‘re” and never “‘res.””, Emphasis on the need for checking doubtful 
spellings by means of pronunciation guides tends to do away with the 
‘curiousity’ and “‘fourty” type of errors. Finally, a general recogni- 
tion of the fact that such errors are being made may in itself be sufficient 
to bring about their disappearance. 


5. Aids for Homonym Errors 


These errors yield most readily to meaningful, associative devices. 
As suggested above, “principle” meaning ‘“‘rule”’ is easily differentiated 
from “principal” by recognition of the “le” of “rule.” ‘‘ Mantel” is 
differentiated from “‘mantle”’ by remembering that one could put a 
telephone on a “‘shelf’”’ (mantel) but not on a “‘cloak”’ (mantle), etc. 





* Cf. Watson*® for a fuller exposition of the deductive approach. 

+ Perhaps the major objections to the use of rules in the teaching of spelling 
are attributable to the difficulty encountered in formulating rules concisely and in 
such manner as to minimize the number of exceptions. The writer is preparing for 
publication a detailed spelling manual which includes, in addition to an exposition 
of various remedial techniques, a short list of rules found to be most useful for 


high-school and college students. 








A Diagnostic Spelling Scale 287 


6. Aids for Tension-at-Hard-Spot Errors 


Tension-at-hard-spot errors are essentially evidences of uncer- 
tainty and insecurity. Occasionally this insecurity is symptomatic of 
deep-seated ego-tensions and problems of personality maladjustments. 
More usually, however, they represent an absence of spelling guides 
and/or bad spelling habits of which the student is partially aware. In 
the latter cases, these errors frequently drop out of themselves if the 
student is given a usable set of rules and has been taught to seek out 
meaningful associations for the hard spots. With increased recognition 
of the semblance of lawfulness of English spelling comes an increased 
self-confidence and a decrease in tension errors. If tension errors per- 
sist in the face of the above devices, some form of psychological or 
psychiatric assistance should be considered, for the displacements of 
tension systems in the personality are devious and spelling and reading 
difficulties are not infrequently symptomatic of basic problems of 
maladjustment and insecurity. 


7. Aids for Substitution Errors 


These errors are significant only if they occur habitually. They 
may be a symptom of inattention, of some underlying sensory defect, 
orboth. The underlying cause should always be carefully investigated 
and corrected wherever possible. The simple device of seating a stu- 
dent with subnormal auditory acuity in the front row of seats or of 
fitting proper lenses, for example, may make for considerable improve- 
ment in general classroom work and in spelling, in particular. 


SUMMARY 


The primary function of the Wellesley Scale is to provide a short, 
easily administered, diagnostic spelling test for high-school and college 
levels. The test was constructed from the most frequent misspellings 
in freshman themes and consists of commonly misspelled words in the 
active written vocabularies of students at this level. Use of the test 
indicates that college students vary widely in their ability to spell, but 
the test is recommended primarily for the selection and diagnosis of 
individual poor spellers rather than as a wide-range testing instrument. 

Spelling errors are classified for diagnostic purposes into seven cate- 
gories. For each category, specific remedial measures are recom- 
mended. The error-categories are: (1) Phonetic; (2) distorted 
pronunciations; (3) inapplication or misapplication of spelling rules; 
(4) negative transfer effect; (5) confusion of homonyms; (6) errors 





ete 





288 The Journal of Educational Psychology 


which give evidence of tension at the “ hard spot”’ of the word; and (7) 
errors of substitution of a word or word-form. 

The actual percentage of errors of each error-category made on a 
spelling test by an individual student or by groups of students is partly 
a function of the particular spelling test used but the principle of 
construction of the Wellesley Scale gives its general error trends special 
importance for the college level. The high percentage of distorted 
pronunciation errors, for example, is a serious reflection both on the 
quality of the diction and the spelling ability of college students, while 
the prevalence of phonetic errors suggests a lack of familiarity with 
reliable spelling guides and a consequent resort to ‘‘the way in which 
the word sounds.” The frequency of rule errors, moreover, is a strong 
argument for the formulation and teaching of concise spelling rules. 

The evidence for the existence of “hard spots” at the college level is 
indisputable. Their presence further indicates a need for introducing a 
semblance of lawfulness in English spelling wherever lawfulness exists. 
Spelling rules, pronunciation guides, stress on word-families and on the 
relationship between English spelling and Latin derivatives are among 
the major ways of introducing this lawfulness. 


BIBLIOGRAPHY 


1. Almack, J. C. and Staffelbach, E. H.: ‘‘Spelling Diagnosis and Remedial 
Teaching.”” Elementary School Journal, Vol. xxxtv, 1934, pp. 341-350. 

2. Anderson, W. N.: “Determination of a Spelling Vocabulary based upon 
Written Correspondence.” University of Iowa Studies in Education, Vol. u, 
No. 1, 1921. 

3. Ashbaugh, E. J.: ‘The Iowa Spelling Scales.” Journal of Educational Research 
Monographs, No. 3, 1922, pp. 144. 

4. Book, W. F. and Harter, R. S.: “ Mistakes which Pupils Make in Spelling.” 
Journal of Educational Research, Vol. xx, 1929, pp. 106-118. 

5. Brandenburg, G.: “The Spelling Ability of University Students.”” School and 
Society, Vol. vit1, 1917, pp. 26-29. 

6. Breed, F. 8.: How to Teach Spelling. Danville, N. Y.: Owen Pub. Co., 1930, 
pp. 1-177. 

7. Buros, O. K., edit.: The Nineteen Forty Mental Measurements Yearbook. 
Highland Park, N. J.: Mental Measurements Yearbook, 1941, pp. 1309 
1316. 

8. Carroll, H. A.: “Effect of Intelligence upon Phonetic Generalization.” Jour- 
nal of Applied Psychology, Vol. xv, 1931, pp. 168-181. 

9. Carroll, J. B.: ‘Knowledge of English Roots and Affixes as Related to Vocabu- 
lary and Latin Study.” Journal of Educational Research, Vol. xxxtv, 1940, 
pp. 102-111. 

10. Cook, W. W.: “‘The Measurement of General Spelling Ability Involving Con- 
trolled Comparisons between Techniques.” University of Iowa Studies in 


Education, Vol. v1, No. 6, 1932, pp. 112. 





11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


21. 





A Diagnostic Spelling Scale 289 


Distad, H. W. and Davis, E. M.: “A Comparison of Column-Dictation and 
Sentence-Dictation Spelling with Respect to Acquisition of Meaning of 
Words.” Journal of Educational Research, Vol. xx, 1929, pp. 352-359. 

Foran, T. G.: Psychology and the Teaching of Spelling. Washington, D. C.: 
Catholic University Press, 1934, pp. 234. 

Gates, A. I.: A List of Spelling Difficulties in 3876 Words. New York: Bureau 
of Publications, Teachers College, Columbia University, 1937, pp. 166. 

Gates, A. I.: Generalization and Transfer in Spelling. New York: Bureau 
of Publications, Teachers College, Columbia University, 1935, pp. 80. 

Gates, A. I. and Chase, E. H.: ‘‘ Methods and Theories of Learning to Spell by 
Studies of Deaf Children.” Journal of Educational Psychology, 1926, Vol. 
xvul, pp. 289-300. 

Gilbert, L. C.: “‘An Experimental Investigation of Eye-Movements in Learn- 
ing to Spell.” Psychological Monographs, Vol. xii11, 1932, pp. 1-81. 

Gilbert, L. C.: ‘‘A Study of the Effect of Reading on Spelling.”” Journal of 
Educational Research, Vol. xxvii, 1935, pp. 570-576. 

Guiler, W. S.: “Improving College Freshmen in Spelling.”” Journal of Educa- 
tional Research, Vol. xxtv, 1931, pp. 209-215. 

Hartmann, G. W.: The Constancy of Spelling Ability among Undergraduates.” 
Journal of Educational Research, Vol. xxtv, 1931, pp. 303-305. 

Hildebrandt, E. L.: “Can High School Students Spell?”’ School Review, Vol. 
xxx, 1924, pp. 779-781. 

Horn, E.: A Basic Writing Vocabulary: 10,000 Words Most Commonly Used in 
Writing. State University of Iowa, Monographs in Education, Series 1, 
No. 4. Iowa City: The University, 1926, pp. 225. 


22. Horn, E.: “A Source of Confusion in Spelling.” Journal of Educational 


Research, Vol. xrx, 1929, pp. 47-55. 


23. Horn, E.: “‘ Principles of Method in Teaching Spelling as Derived from Scien- 


& 


30. 


31. 





tific Investigation.’”’ Eighteenth Yearbook of the National Society for the 
Study of Education, Part II. Bloomington, Ill.: Public School Publishing 
Co., 1919, pp. 52-77. 

Kelley, T. L.: Statistical Method. New York: Macmillan, 1923. 

Kiefer, F. A. and Sangren, P. V.: “An Experimental Investigation of the 
Causes of Poor Spelling among University Students with Suggestions for 
Improvement.” Journal of Educational Psychology, Vol. xvi, 1925, pp. 
38-47. 

Lester, J. A.: A Spelling Review. Milwaukee: Casper Co., 1935, pp. 110. 

Lester, J. A.: “Teaching Freshmen to Spell.” English Journal, Vol. v, 1916, 
pp. 404-410. 

Lull, G. H.: “Plan for Developing a Spelling Consciousness.”’ Elementary 
School Journal, Vol. xv, 1917, pp. 355-361. 


. Masters, H. V.: ‘‘A Study of Spelling Errors: a Critical Analysis of Spelling 


Errors Occurring in Words Commonly Used in Writing and Frequently 
Misspelled.”” University of Iowa Studies in Education, Vol. tv, No. 4, 1917, 
pp. 80. 

McKee, J. H. and Conklin, R. J.: ‘‘College Spelling—Can It Be Improved?” 
English Journal, College edition, Vol. xvu, 1928, pp. 43-49. 

Mendenhall, J. E.: An Analysis of Spelling Errors. New York: Bureau of 
Publications, Teachers College, Columbia University, 1930, pp. iii + 65. 





290 The Journal of Educational Psychology 


32. 


34. 


35. 


36. 


37. 


39. 


41. 


42. 


gs 


45. 


47. 


48. 


49. 


51. 


Mendenhall, J. E.: ‘The Characteristics of Spelling Errors.” Journal of 
Educational Psychology, Vol. xx1, 1930, pp. 648-656. 

Miller, W. S.: Word Wealth. New York: Henry Holt, 1939, pp. vii + 344. 
Appendix 2: Dunn, E. M.: Teaching of Spelling, pp. 323-327. 

Nelson, M. J. and Denny, E. C.: “The Multiple Choice Spelling Test,” 
School and Society, Vol. xu1v, 1936, pp. 15-17. 

Northy, A. S.: “A Comparison of Five Types of Spelling Tests for Diagnostic 
Purposes.” Journal of Educational Research, Vol. xx1x, 1936, pp. 339-346. 

Phillips, D. P.: ‘Comparison of the Two Response and Dictated Recall Types 
of Spelling Tests.”” Journal of Educational Research, Vol. xx111, 1931, pp. 
17-24. 

Pressey, L. C.: ‘An Investigation into the Elements of the Ability to Spell.” 
Journal of Educational Research Bulletin, Vol. v1, 1927, pp. 203-204. 

Progressive Achievement Tests, Advanced Battery, Form A. Los Angeles: 
California Test Bureau. 

Rogers, D.: ‘‘Teaching the Hard Spots in Words.” Chicago School Journal, 
Vol. 111, 1926, pp. 256. 


. Russell, D. H.: ‘Characteristics of Good and Poor Spellers; A Diagnostic 


Study.” Teachers College, Columbia University, Contributions to Education, 
No. 727, 1937, pp. 103. 
Simmons, E. P. and Bixler, H. H.: The Standard High School Spelling Scale, 
revised edition. Atlanta, Ga.: Turner E. Smith and Co., 1935, pp. 63. 
Spache, G.: “‘A Critical Analysis of Various Methods of Classifying Spelling 
Errors, I.” Journal of Educational Psychology, Vol. xxx1, 1940, pp. 111-134. 
Spache, G.: “‘ Validity and Reliability of the Proposed Classification of Spelling 
Erros, II.”’ Journal of Educational Psychology, Vol. xxx1, 1940, pp. 204-214. 


. Steadman, J. M., Jr.: Vocabulary Building. Atlanta: Turner and Co., 1937, 


pp. 217. 
Sudweeks, J.: ‘‘ Practical Helps in Teaching Spelling: Summary of Helpful 
Principles and Methods.”” Journal of Educational Research, Vol. xv1, 1927, 


pp. 106-118. 


. Tidyman, W. F.: Teaching of Spelling. Yonkers-on-Hudson: World Book Co., 


1919, pp. ix + 178. 
Tireman, L. S.: ‘“‘ Value of Marking Hard Spots in Spelling.’””’ University of 


Iowa Studies in Education, Vol. v, No. 4, 1930, pp. 5-48. 

Thorndike, E. L.: A Teacher’s Word Book: The Twenty Thousand Words 
Found Most Frequently and Widely in General Reading for Children and 
Young People. New York: Bureau of Publications, Teachers College, 
Columbia University, 1931, pp. vii + 182. 

Watson, A. E.: Experimental Studies in the Psychology and Pedagogy of Spelling. 
Teachers College, Columbia University, Contributions to Education, No. 
638, 1935, pp. 144. 

Weller, L. and Broom, M. E.: “A Study of the Validation of Six Types of 
Spelling Tests.” School and Society, Vol. xu, 1934, pp. 103-104. 

Woodworth, R. S.: Experimental Psychology. New York: Henry Holt, 1938, 
p. 176. 








-— oe oe 





OWN ESTIMATE AND OBJECTIVE MEASUREMENT! 


SETH ARSENIAN 


Department of Psychology, Springfield (Mass.) College 


A. INTRODUCTION 


A freshman entering college—and there are nearly three hundred 
thousand entering colleges every year—makes an important decision, a 
decision that entails the spending of several thousand dollars of his own 
money or that of his parents; devoting several years of his life to attend- 
ing classes rather than working in factory, shop or office to earn an 
income; and studying hard for these years with the hope of becoming a 
professional man, a man of culture, or a “leader,” as some college 
presidents put it. Entering college is a major decision, probably the 
most important, at that stage of the development of the young man or 
woman. 

It might be assumed or hoped that such a major decision in the 
individual’s life would be based on a realistic evaluation of the factors 
that are involved in making such an important undertaking a success. 
One set of powerful factors related to the success of a college career, 
as shown by many and repeated investigations, includes a freshman’s 
scholastic aptitude, his achievement in common subjects, his adjust- 
ment, and his vocational interests and motivation. A realistic evalua- 
tion by a freshman of his strengths and weaknesses in these four areas 
would seem to be related to the success of his college career. 

The present study is an attempt to check on this assumption, and 
more specifically to answer the following three questions: 

(1) How close is the freshman’s own estimate of his scholastic 
aptitude, achievement, adjustment and interests to objective measure- 
ments of these same attributes? Objective measurements refer to 
tests, and it is assumed that the tests are more reliable and valid 
indicators of these various attributes than are self-estimates. The 
many experiments reported in literature show the higher correlation of 
college success with these various attributes when they are measured 
by good tests than when they are merely self-estimated. 

(2) Does the experience of taking the test improve the closeness of 
the correspondence between the student’s own estimate with objective 





1 Credit is due to Joseph Goodner and Frank Turek, students at Springfield 
College, for their assistance in the clerical and the statistical work for this study. 
291 


< ~ Ate. 
¢ . wii 





292 The Journal of Educational Psychology 


measurement? What is the relation between the self-estimates before 
and after taking the tests? 

(3) What are the characteristics of students who grossly overrate 
or underrate themselves? What kind of college career do they have? 


B. THE EXPERIMENTAL SET-UP 


Springfield College, where this experiment was conducted, has a 
Freshman Days program. The freshmen arrive on the campus five 
days before the beginning of the classes. During these five days they 
undergo a rather thorough physical examination and testing program, 
take a number of psychological tests, get acquainted with each other, 
with the upper classmen, with the college in general, complete their 
registration and are set to go on Monday at 8 A. M., when the regular 
classes begin. 

In the Fall of 1940 as part of the Freshman Days program the enter- 
ing freshmen were given the following psychological tests: 

MD) The American: Council Psychological Examination, 1938 
Edition. 

(2) The Codéperative English Test, Form OM, consisting of the 
following three parts: Usage, Spelling, and Vocabulary. 

(3) The Codédperative General Culture Test, Form O, consistir¢ of 
the following five parts: Social Studies, Foreign Literature, Fine Arts, 
Scienee, and Mathematics. 

(4) Strong Vocational Interest Blank, Revised Form M, 

(5) Bell Adjustment Inventory, Student Form, consisting of the 
following four parts: Home, Health, Social, and Emotional Adjust- 
ment. (This one test was administered in classroom several weeks 
after the opening of the school.) 

Just before taking the A. C. E. Psychological Examination the 
students were asked to rate themselves on the ability that this test 
measures. A mimeographed sheet given to each student contained the 


following directions: 


TO THE FRESHMEN 


The Test which you are about to take attempts to measure your ability 
to comprehend, to see abstract relations, to reason, to think, to apply the 
knowledge you possess 'to new situations. This general ability is called 
scholastic aptitude or intelligence. Your score will be compared with scores 
on this same test of thousands of other freshmen entering colleges in this 


country. 











Own Estimate and Objective Measurement 


293 


Will you make the best judgment you can of your own scholastic aptitude 


in comparison with all freshmen entering this and other colleges and place a 
check mark (\/) in the appropriate column describing the classification to 
which you think you belong. 





Lowest 10 per Between 10—-| Between 25—/| Between 50- Between | Top 10 
cent of enter- 25 t: | 50 t: | 75 _ | 75-90 per | per cent: 
ae freshmen : per cen : per cent: 3 per cent: omits Very 
ing Inferior | Low average | High average : 


Very inferior Superior | superior 


























Name of student- 





Similar rating forms were used for the English and the General 
Culture Tests, in each case describing the subject-matter content of the 
test as explicitly as possible. Thus, for the English Test the directions 
were: 


The Test which you are about to take attempts to measure your knowledge 
in English Usage (includes grammar and diction, punctuation, capitalization, 
and sentence structure), Spelling, and Vocabulary. Your score will be com- 
pared with the scores on this same test of thousands of other freshmen entering 
colleges in this country. 


For the Strong Vocational Interest Blank, because of the require- 
ment of a different method of comparison, the directions for rating were 
different from the other tests described above and went as follows: 


The Inventory which you are about to fill out attempts to measure the 
extent to which your interests agree or disagree with those of successful men 
in twenty-seven occupations listed below. It is your interests and not your 
abilities for these occupations that are being measured. 

Will you make the best judgment you can of your likes and dislikes for 
different kinds of work, your preferences for activities connected with occu- 
pations, and your general pattern of interests (irrespective of your abilities), 
and then identify in the list below five occupations which in your opinion most 
closely correspond with your interests. The five occupations are to be marked 
as follows: 

If you believe that your likes and dislikes and the general pattern of your 
interests correspond most closely with that of successful men in a certain 


a 


eR ~. 





294 The Journal of Educational Psychology 


occupation place the numeral 1 in the parenthesis opposite the name of that 
occupation. Place the numeral 2 against the name of an occupation which 
you estimate to be second close to your pattern of interests, and so on until 
you have marked five occupations. 


There follows a list of twenty-seven occupations for which Strong 
has devised scoring keys. 

Because of the nature of the test, the rating on the Bell Adjustment 
Inventory was made after rather than before taking the test. The 
directions for rating were as follows: 


The questionnaire which you have just filled out attempts to measure 
your adjustment in four major areas; namely, home, health, social, and emo- 
tional. Your score will be compared with that of other college men who 
have filled out this same questionnaire. 

Will you make the best judgment you can of your own adjustment in com- 
parison with that of other men in this and other colleges and place a check 
mark (\/) in the appropriate column describing the classification to which 


you think you belong. 


The classification of Bell in describing the scores in this test were 
used as follows: Excellent, Good, Average, Unsatisfactory, Very unsat- 
isfactory. For the social scale the classificatory categories were: Very 
aggressive, Aggressive, Average, Retiring, and Very retiring. 

The student ratings for all tests except the Bell Inventory were 
made before taking the tests. For the A. C. E. Psychologica] Exami- 
nation and for the English tests ratings were made both before and after 
taking the tests. 

Complete records of test results and ratings were secured for one 
hundred twenty-five freshmen constituting the total population for this 
investigation. 


C. RESULTS 


1. Correspondence of Initial Ratings with Test Results.—This cor- 
respondence was studied by two methods: (1) By calculating the 
Means, Standard Deviations and the Coefficients of Variation for test 
scores made by the students in each quarter (highest, high average, 
low average and lowest) of rating classification for each test separately, 
and (2) by computing the Contingency Coefficients between ratings 
and test norm placements by quarter categories. 

These results are indicated in Tables I and II. 








t 





Own Estimate and Objective Measurement 


Taste I.—InrT1AL RatTinc anp Test Score 


295 











Contingency 
Coefficient 
Quarter N — Coefficient - 
Name of test own eng Mean or ee of 

rating . viation | Variati 

ariation Ob- ie 

tained | rected 

A. C. E. Psychological Exami-| 100-75 6 {100.33 19.23 19.16 . 26 .30 
nation. 75-50 72 78.75 18.83 23.91 
50-25 46 72.98 15.90 31.79 

25-0 1 73.00 

Codperative English—Usage...| 100-75 11 51.36 8.79 17.11 .24 .27 
75-50 64 48.83 6.36 13.03 
50-25 47 46.94 7.96 16.96 
25-0 3 42.33 8.35 19.73 

Codperative English—Spelling.| 100-75 19 | 56.79 9.81 17.27 .33 .38 
75-50 54 50.12 10.02 19.99 
50-25 36 43.84 8.70 19.84 
25-0 16 34.63 7.92 22.88 

Codperative English—Vocabu-| 100-75 8 | 68.00 10.24 15.06 .49 . 57 
lary. 75-50 63 58.71 8.76 14.92 
50-25 45 50.20 6.99 13.92 
25-0 9 43.60 7.44 17.06 

General Culture—Social| 100-75 3 73.33 3.68 5.02 .35 .40 
Studies. 75-50 59 38.20 21.21 55.52 
50-25 55 33.64 13.95 41.47 
25-0 s 33.88 18.35 54.22 

General Culture—Foreign Lit- | 100-75 1 Ps. hveed. EB”  owege .42 .48 
erature. 75-50 8 38.25 16.55 43.26 
50-25 59 20.97 12.40 59.10 
25-0 57 19.98 11.55 57.81 

General Culture—Fine Arts...| 100-75 1 SF (a Teor .39 .45 
75-50 15 43.00 20.50 47 .67 
50-25 66 31.47 14.90 47.34 
25-0 43 25.84 12.00 46.44 

General Culture—Science. .... 100-75 5 66 . 20 8.93 13.49 .38 .44 
75-50 49 52.52 13.90 26.47 
50-25 51 48.18 13.10 27.19 
25-0 20 41.25 12.87 31.20 

General Culture—Mathematics| 100-75 s 28.50 7.92 27.78 44 .51 
75-50 32 22.97 8.37 36.44 
50-25 58 19.09 6.57 34.42 
25-0 27 14.65 6.20 42.32 


























» 
=~ 
> 





296 The Journal of Educational Psychology 


TABLE II.—CoRRESPONDENCE BETWEEN STATED AND MEASURED OCCUPATIONAL 
PREFERENCES 





Measured occupational preferences on Strong inventory 














Stated occupa- “ is saat 
tional preference 1 9 3 4 5 8 
Num- Per- Num- Per- 
ber | centage| ber | centage 
1 31 | 19 | 13 | 16 2 81 66 42 34 
2 21 | 12 7 7/| 8 55 45 68 55 
3 14/13 8 7 8 50 41 73 59 
4 6 6 7 7/1 6 32 26 91 74 
5 4 9 8; 8] 8 37 30 86 70 
re 76 | 59 | 43 | 45 | 32 255 41 360 59 
































Contingency Coefficient for 41 per cent “agreements”’ .29, corrected .33. 


On the basis of the results presented in Tables I and II the following 
observations can be made: 

(1) The average test scores on the Psychological Examination, the 
three subtests in English, and the five subtests in General Culture show 
a descending order in size corresponding to students’ own rating from 
highest to lowest quarter. 

(2) The scatter or variation in the test scores is about as large for 
one quarter of student estimate as another. There is no discernible 
trend of increase or decrease in test score variation from one quarter 
of student estimates to another. 

(3) The correspondence between student estimate and test norm 
placement per quarter as expressed by the contingency coefficients 
(corrected) varies from .27 (English usage) to .51 (mathematics). If 
we assume for a moment that these corrected contingency coefficients 
are approximately equal to corresponding Pearson Product-Moment 
coefficients of correlation and calculate the Efficiency of Prediction of 
test scores from student self-estimates we find a range of four to four- 
teen per cent prediction better than chance. 

(4) With regard to the relation of stated and measured occupational 
preferences we find a forty-one per cent agreement for the first five 
stated preferences. The chances of the students’ stated occupational 
preferences falling among the five highest scores on the Strong Blank is 








—_— ov VG Fe VS 


— 


———_ (Ly 





Own Estimate and Objective Measurement 


TaB_LeE III.—Finaut Ratinc anp Test Score 


297 











Contingency 
Coefficient Coefficient 
ame of test Quarter own Num- Mean Standard of 
rating ber Deviation Variation 
Ob- Cor- 
tained | rected 
A.C. E. Psychological 100-75 [nn sveae- — -ddine 31 . 36 
Examination. 75-50 38 84.37 19.35 22.93 
50-25 82 75.21 17.01 22.62 
25-0 4 54.50 14.36 26.35 
Codperative English— 100-75 14 56.79 7.98 14.05 .49 . 57 
Usage. 75-50 58 48.72 6.30 12.93 
50-25 43 45.43 5.22 11.49 
25-0 10 45.50 12.05 26.48 
Codperative English— 100-75 10 60.00 12.45 20.75 .57 .66 
Spelling. 75-50 41 52.46 8.88 16.93 
50-25 46 45.52 8.67 19.05 
25-0 28 37 .07 8.25 22.26 
Codperative English— 100-75 4 66.25 9.83 14.84 .45 .52 
Vocabulary. 75-50 38 60.32 9.93 16.46 
50-25 54 55.16 8.22 14.90 
25-0 29 46.97 7.32 15.58 
Bell Adjustment— Excellent 49 2.67 2.91 105.43 .70 .78 
Home. Good 57 4.26 3.36 78.87 
Average 16 8.69 3.50 40.28 
Unsatisfactory 2 13.00 6.32 48.62 
Very 
unsatisfactory 1 22.00 
Bell Adjustment— Excellent 33 4.42 2.41 54.52 .52 .58 
Health. Good 80 5.98 4.20 70.23 
Average 8 7.13 3.55 49.79 
Unsatisfactory 4 14.00 7.84 56.00 
Very 
unsatisfactory 
Bell Adjustment— Excellent 7 5.14 .95 57.39 .33 .37 
Emotional. Good + 65 6.56 4.66 71.04 
Average 51 9.87 5.42 54.91 
Unsatisfactory 2 11.50 .50 4.35 
Very 
unsatisfactory 
Bell Adjustment— Very 
Social. aggressive 4 2.50 . 87 34.80 .62 .69 
Aggressive 31 5.98 4.22 70.57 
Average 69 10.07 6.08 60.38 
Retiring 18 17.72 6.04 34.09 
Very retiring 3 23.00 1.41 6.13 





























-~ = et ~ a 





298 The Journal of Educational Psychology 


considerably higher for the students’ first choice than for any others. 
The degree of correspondence of the forty-one per cent ‘‘agreements” 
of the first five stated occupational choices in relation to measured 
interests is expressed by a corrected contingency coefficient of .33. In 
fifty-nine per cent of the cases the students’ first five stated occupa- 
tional choices do not agree with the five highest scores on the Strong 
Vocational Interest Inventory. 

2. Correspondence of Final Ratings with Test Results.—The methods 
of comparison are the same as in the case of the initial ratings. The 
results are shown in Table III. 

The figures in Table III show that: 

(1) The positive correspondence between average test scores and 
each quarter of the students’ ratings continues consistently. 

(2) Both in absolute or in relative variability of test scores no 
change is indicated from initial to final rating. The scatter of scores 
in the various quarters is about as large in the final as in initial rating 
and does not vary consistently or significantly from quarter to quarter. 

(3) The relation between test scores and rating is considerably 
higher for the Bell Adjustment Inventory (except on the emotional 
scale) than for other tests. It should be recalled that the rating for this 
test was made after taking the test. 

(4) For the four tests; namely the Psychological Examination, the 
English Usage, Spelling and Vocabulary tests the corrected contingency 
coefficients showing the extent of association between ratings and test 
scores are higher for final than for initial ratings. This indicates that 
the student’s experience in taking the test improves somewhat his own 
rating of his abilities. 


TaBLE IV.—CoMPARISON OF RATINGS BEFORE AND AFTER TAKING TEST 























Mean Standard Deviation 
Test 
Before After Before After 
Psychological Examination........... 53.50 44.00 14.50 13.25 
ESE IG ee Sam 53.50 52.25 16.75 19.75 
English—Spelling................... 52.25 43.50 22.25 22.50 
English—Vocabulary................ 51.00 39.50 18.00 20.00 





3. Correspondence between Ratings Before and After Taking Tests.— 
Further examination of students’ ratings before and after taking the 
test indicates (Table IV) that there is a consistent reduction of the 








Own Estimate and Objective Measurement 299 


average rating from the first to the second rating. With the exception 
of one test (the Psychological Examination) the averages of the second 
ratings are closer to the median ranks of the group on these tests. Also 
(Table V) in examining the quarter discrepancies between test scores 
and ratings before and after taking the test we note with one exception 
(the Psychological Examination) a reduction in the discrepancy scores 
from the first to the second rating. However, in neither instance is 
there any consistent or noticeable change in the variations of the 


ratings. 


TABLE V.—DISCREPAN CY BETWEEN TEST SCORE AND RA1ING BEFORE AND AFTER 
TAKING TEST 














Mean Standard Deviation 
Test 
Before After Before After 
Psychological Examination........... — .16 — .53 1.05 1.01 
IR, css csce ceeeneeeuee 1.09 1.03 1.01 91 
English—Spelling................... .75 .46 .98 .93 
English—Vocabulary................ 34 — .07 .99 1.04 














The Contingency Coefficients between the ratings before and after 
taking the tests are: 





Obtained, C | Corrected, C 











Psychological Examination....................... .52 .60 
NN. i, oo ko 06a READ RASC AREAS Kee tees . 585 .68 
Rs 5. scaw chee Rebates 66sesn ene 49 .69 .80 
ND, 6... occ cledundanessevencevens .61 .70 





4. Characteristics of Students Grossly Over- or Under-estimating 
Themselves.—Whenever the difference between a student’s own rating 
and the test score was one quarter or more, then it was counted an over- 
or under-estimate depending upon the direction of the difference. On 
the basis of this definition of over-under estimate there were a total of 
one hundred sixty-five (sixty-two per cent) over-estimates' and one 
hundred one (thirty-eight per cent) under-estimates. The number and 
percentage of over-under estimates are shown in Table VI. 





1 One half of the over-estimates were on the English Test. 





= et a 
—- . 





300 The Journal of Educational Psychology 


TaBLE VI.—OVER-UNDER ESTIMATES 

















Number of over-under 
estimates Frequency Per cent of 125 

0 15 12.0 
! 37 29.6 
2 30 24.0 
3 20 16.0 
‘ 10 8.0 
6 9 7.2 
6 4 3.2 

125 100.0 











In order to investigate the characteristics of the students with gross 
over-under estimates, thirteen students with five or six over-under 
estimates were compared with fifteen students who had made no over- 
under estimates. These two groups were compared with reference to 
age, scores on the Psychological Examination, and the Bell Adjustment 
Inventory. 


TaBLE VII.—ComPpaRIsON oF 0 wits 5-6 OvER-UNDER ESTIMATES 








a Stand- | ,, .” Level of 
Item for comparison Group b Mean | ard De- significance 
er Paes test 
viation per cent 
Re ee Be ee er 0 15 19.47 1.05 
56 | 13 |19.26| .73 | ‘89}| 4 
Psychological Examination.| 0 15 | 81.20 | 21.58 1.20 23 
5-6 13 71.92 | 17.16 ; 
Bell—Home.............. 0 15 2.60 2.33 
5-6 13 4.31 2.76 1.71 10 
Bell—Health.............. 0 15 3.73 2.57 
56 | 13 | 6.38| 4.93 | 2-7) 1 
Bell—Emotional........... 0 15 6.40 | 3.24 
5-6 13 8.92 7.00 1.20 23 























There is no reliable age difference between the two groups. On the 
Psychological Examination and the Bell Adjustment Inventory! the 0 
over-under estimate group has the more favorable scores. While the 
differences between the two groups as tested by the Fisher “‘t’’ test do 





1 The Bell social scale was not included in this comparison because both ends 
of that scale show maladjustment. 








Own Estimate and Objective Measurement 301 


not give a conservative level of significance, they are none the less con- 
siderable and are consistently in favor of the group that does not over- 
or under-estimate its abilities and adjustment. 

In addition to this comparison of the groups, individual analysis 
was made of the background and academic records of every student 
included in these two groups. The summary of the case studies of the 
two groups is presented in Table VIII. 


TaBLeE VIII.—Summary or Case Srupies or 0 anp 5-6 OverR-UNDER ESTIMATES 





0 Group | 5-6 Group 





Item for comparison Num-| Per |Num-| Per 


ber | cent | ber | cent 

















1. Second trial at college (having failed in another college) ; 2 | 15 
2. On prescription during freshman year............... 1 7 4 | 30 
3. Dismissal or advised not to return at end of year... .. 2 | 13 3 | 23 
4. Personality problem—maladjusted................. Af <i 7 | 54 
i SIGS c.c0nc cdo tab add Anes shabvacewebeaes 11 73 1 8 





A study of the separate cases of these students shows in general that 
students with no gross over- or under-estimates are more strongly 
motivated (carry heavier load and do better work). Their stated voca- 
tional objectives are in greater harmony with the pattern of interests 
shown on the Strong Blank, the ratings on character and personality 
made by the faculty counselors rate these students much more favora- 
bly than the students with 5-6 over-under estimates. The latter, on 
the basis of the counselors’ notes in their folders, have more problems 
and require in proportion to their number much more of the time of the 
counselors than do the 0 over-under estimates. 


D. SUMMARY AND CONCLUSIONS 


1. Major Findings.—(1) A freshman’s estimates of his abilities, 
knowledges and interests—factors positively related to his academic 
suecess—do not correspond highly with his actual possession of these 
attributes as measured by objective tests. There is a wide variation 
from subject to subject and from individual to individual in the close- 
ness or distance between self-estimate and objective measurement. 

(2) The student’s estimates of his abilities and knowledges are more 
closely related to objective measurement when the rating by the stu- 





OE Se 





302 The Journal of Educational Psychology 


dent is done after taking the test than before. The variability in 
estimate continues to be large. 

(3) The ratings after taking the tests show a toning down in esti- 
mates and are more in the direction of the correct placement by the 
tests. 

(4) Students who grossly over- or under-estimate their abilities, 
knowledges and adjustment are as a group somewhat less intelligent 
and less well-adjusted. A larger proportion of these students, as 
compared with students who do not over- or under-rate themselves, are 
on prescription, are dismissed from college, or are the problem cases 
demanding proportionally more time from the faculty counselors. 

2. Discussion.—The progressive development of a sense of reality 
on the part of the individual is one of the major aims of education. By 
sense of reality is to be understood not only the correct evaluation of 
physical and social forces in our environment, but also a close estimate 
of the abilities, knowledges, interests and the adjustment of the individ- 
ual himself. To the extent that our evaluation of factors and forces 
within and without us is correct, to that extent a workable harmony 
between self and environment may be expected to follow. Contrari- 
wise, when our estimate or evaluation of these forces—actual or poten- 
tial—is incorrect, biased, or restricted, disappointments, failures, and 
frustrations may ensue. 

Formal education does passably well in supplying information and 
thereby assisting the growing individual’s sense of the reality of matter, 
energy and the forces in general in our environment. But the correct 
evaluation of our strengths and weaknesses of forces within us, despite 
the old dictum of Socrates, ‘‘ Know thyself,’’ remains insufficiently 
achieved by formal education methods. The guidance movement, 
with its emphasis on testing and on correct evaluation of the individ- 
uals’ strengths and weaknesses, attempts to fill this need of developing 
in the student a progressively correct sense of the reality of his own 
powers and limitations. It would seem that this systematic introduc- 
tion of the student to himself should not only be a major objective of 
the guidance work in schools, but should also constitute one of the 
criteria on which to base a judgment of the success or failure of a 


guidance program. 








SEX DIFFERENCES IN CLERICAL APTITUDE 
GWENDOLEN G. SCHNEIDLER AND DONALD G. PATERSON 


University of Minnesota 


Women and girls are markedly superior to men and boys on 
the Minnesota Vocational Test for Clerical Workers which involves 
the checking for identities or differences of paired numbers and paired 
names. In view of the literature on sex differences which reveals, in 
general, an absence of sex differences in measured abilities, these 
findings are of theoretical significance and present a challenge for 
explanation. 

These findings also have practical implications for persons responsi- 
ble for vocational counseling and the selection of clerical workers. The 
existence of these sex differences does not mean that men should be 
discriminated against as applicants for clerical positions nor that boys 
should be barred from entrance to commercial courses or from begin- 
ning positions in office work. Even so, it must be recognized that a 
larger proportion of women than of men and of girls than of boys are 
likely to possess this important component of clerical success. 

Sex differences on this test were first disclosed for a group of adults 
(five hundred men and two hundred thirty-two women) selected by the 
Employment Stabilization Research Institute at the University of 
Minnesota to represent a sample of the population of gainfully occupied 
adults (Green et. al.,*). Andrew and Paterson* reported that only 
eighteen per cent of the men on Test I (Number Checking) and sixteen 
per cent of the men on Test II (Name Checking) reached or exceeded 
the median for the women. Table I shows an average difference of 
thirty points in favor of the women on Test I and a difference of thirty- 
four points in favor of the women on Test II. These sex differences are 
statistically significant as indicated by ratios of differences to standard 
errors of the differences. Despite the fact that these samples were 
selected to be representative of the adult employed population as a 
whole, it was suspected by some that the revealed differences resulted 
from occupational differences among men and women with an over- 
weighting of the clerical groups among gainfully occupied women. It 
is for this reason that a series of studies was undertaken to determine 
whether or not these sex differences would disappear when samplings 
of the two sexes were obtained at the lower age and grade levels. 

Data for employed clerical workers also show sex differences although 
ofasmallerorder. Table I indicates that sex differences on the clerical 


303 





a? = ea ee 


Serr tl! Sere 
= ya 





304 The Journal of Educational Psychology 


test are smaller when men and women clerical workers are studied. 
Although women are still superior to men on the average, the small 
differences are not in every case statistically significant. The overlap- 
ping of the sex populations is also much greater, as can be seen from this 
table. From one-third to nearly one-half of the men in these types of 
clerical positions reach or exceed the median for the women in similar 
positions. 


TABLE I.—DIFFERENCES IN CLERICAL TEST SCORES BETWEEN MEN AND WoMEN 
IN THE GENERAL POPULATION AND IN SIMILAR CLERICAL TyPEs OF WorkK* 





























Percentage 
of men 
Geeep Test| N |Mean| SD Differ- Critical reaching or 
ence | ratio exceeding 
median for 
women 
General population, men..... I /491) 83.1/29.2 30.01 12.4 18 
General population, women...| I |229/113.1/30.8 
General population, men... .. II |491) 77.6\34.3 33.7 11.8 16 
General population, women...| II |229/111.3/36.5 
General clerical, men........ I 41/136. 1/27.9 25 0.4 46 
General clerical, women...... I | 59)138.6|26.3 ; 
General clerical, men........ II | 41!131.0/29.4 
General clerical, women...... II’ | 59)144.4/28.9 4 3.3 ” 
Routine clerical, men........ I | 30)127.5/23.8 29 0.4 35 
Routine clerical, women...... I | 24/180.4/25.3) “ . 
Routine clerical, men........| II | 30,123.0/27.7 6.8 0.9 31 
Routine clerical, women..... . II | 24/129.8'29.4 : 








* Basic data for calculations from Andrew, D. M., and Paterson, D. G., ‘‘ Meas- 
ured Characteristics of Clerical Workers.’”’ Employment Stabilization Research 
Institute Bulletin, Vol. 11, No. 1. Minneapolis: University of Minnesota Press, 
July, 1934, pages 17, 20, 56. 


Schneidler’s normative study of the clerical test scores for a typical 
population of junior- and senior-high-school pupils (Grades vii through 
x11) reveals that there are large and significant sex differences at each 
of the grade levels studied.*: The description of the groups studied in 
terms of intelligence, age-grade location, and parental occupation is 





Sa YF YY YE UE 





Sex Difference in Clerical Aptitude 





305 


given elsewhere."° There is sufficient evidence that these factors were 
not operating to weight the samples in such a way as to make them 
selective in these respects. 

Results of this investigation, as shown in Tables II and III, can 
leave no question but that boys and girls differ markedly in checking 
identities and differences in paired numbers and names. These aver- 
age differences are statistically significant and would probably recur in 
further sex samplings at Grades vim and x11. 


TaBLE I].—DiFFrerRENcEs IN CLERICAL TEst Scores (Test I, Numper CHECKING) 


BETWEEN Boys aNp GIRis AT GRADE LEVELS V THROUGH XII 








 . | Critical | Percentage of boys 

Grade | Sex| N | Mean*/} SD* | Diff.* ratio® | Teaching or exceed- 

ing median for girls 
m | Fl sa3 |izsa | a5] 16 | 18 26 
= | lex |uey | wel? | 2 19 
s | > | ae ltes bone) A) MO 19 
= | Flee {irs | is6| 28] 87 . 
VIII . oe i age 13.5 7.8 21 
m | lio | sa | iso| %8| 43 a 
a |e] | sa | wea)! ] 47 19 
Y tel o| wal] wel?) + 19 


























* These statistics were originally carried to four decimal places and have been 
rounded off to one decimal place for presentation here. 


The last column of Tables II and III shows the percentage of the 
boys who reach or exceed the median score for girls in each test and at 
each grade level from vit through x11. 


These figures indicate that only 





~~? = 


306 The Journal of Educational Psychology 


from one-fifth to one-fourth of the boys make equal or higher scores 
than the median girl. The data show clearly that sex differences at 
all of these grade levels are equally large. At first glance this inference 
might not be apparent because of the increase in the difference between 
average scores for boys and girls from vi through xu. This phenome- 
non, however, is related to the increase in mean scores with age. The 
small and relatively stable percentage of overlapping of scores between 
boys and girls indicates that the differences are as large at Grade vin 
as they are at Grade x11. 


Tase II].—D1rrerences In CuERIcAL Test Scores (Test II, NaME CuEcxkino) 
BETWEEN Boys AND GIRLS AT GRADE LEVELS V THROUGH XII 








Critical Percentage of boys 
Grade | Sex} N | Mean*| SD* | Diff.* . » | reaching or exceed- 
ratio : : : 
ing median for girls 
M | 539 107.0 24.1 
xII F 538 123.4 24 6 16.4 11.0 23 
M |} 381 99.5 23.6 
xz F | 427 | 118.2 | 22.4] 188) 1-6 = 
M | 372 92.8 22.9 
x F | 416 | 108.9 | 24.0} 1&9} 9%6 2 
M | 332 86.2 22.3 
- F | 327 | 101.2 | 21.3) 149} 8&8 a 
M | 284 77.4 17.0 
VIII F | 288 93.3 20.0 15.9 10.3 19 
M | 116 65.2 16.4 
vt | - /120 | 7.7 | 16.8| '°-5| 4-9 25 
M 67 64.7 15.5 
“s ipl wi wel wel *-*| * a 
M 83 53.7 14.5 
v Fr | 39 | 66.9 | 18.0| 2*2} 4° = 


























* These statistics were originally carried to four decimal places and have been 
rounded off to one decimal place for presentation here. 


Further evidence of sex differences on this test was presented in 4 
thesis prepared by Loevinger.’ All pupils in a 7B grade of an average 








B | 


ze 





Sex Difference in Clerical Aptitude 307 


Minneapolis junior high school were given, among other tests, the 
Minnesota Vocational Test for Clerical Workers. Data are presented 
for one hundred sixteen boys and one hundred twenty girls for whom 
the information was complete. Tables II and III show the essential 
data for these boys and girls. The absolute differences at this grade 
level are of the magnitude of ten points on the average and these dif- 
ferences are also statistically significant. Again, only one-fourth of the 
boys reach or exceed the median of the girls. 

Tables II and III also present further data on this same problem 
from an unpublished study completed by Thatcher at the fifth- and 
sixth-grade levels.1' The subjects for the Thatcher study were St. 
Paul public school children of whom eighty-three boys and thirty-nine 
girls were in the fifth grade and sixty-seven boys and seventy-five girls 
were in the sixth grade. The clerical test was administered to all fifth- 
and sixth-grade pupils in three schools selected to be typical of pupils 
from ‘‘middle-class’’ homes. The results may not be exactly com- 
parable to the results for the other studies, since the standard method- 
ology was revised slightly to allow for differences in mental level and 
understanding of directions. It was thought that more preliminary 
practice than is allowed by the few samples in the standard administra- 
tion would be required at this age level. For this reason a practice 
page with fifty items similar to those in the numbers test and fifty 
items similar to those in the names test was used. The examiner 
observed the performance of each child on the practice page to make 
certain that instructions were understood and then collected the papers 
and proceeded with the administration of the test proper. The stand- 
ard scoring method was used. 

It is evident from Thatcher’s results also that the sex differences 
in number and name checking ability are large. The ratios of dif- 
ferences to standard errors of differences are all above four. For Grade 
v the overlapping amounts to only nineteen per cent on Test I and 
twenty-two per cent on Test II. For the sixth grade the overlapping is 
nineteen per cent on Test I and twenty-seven per cent on Test II. 

These striking sex differences in clerical aptitude are difficult to 
explain. At the present time we have only hypotheses to offer. 

First, it might be contended that the differences are due to differ- 
ential clerical training and experiences. Andrew and Paterson,’ 
however, have investigated the problem of the effects of clerical train- 
ing and experience on clerical test scores and contend that these factors 
in themselves do not markedly affect the scores. Our data also negate 


ent =. . 
owt S , > 


Sor , 


“aaa 





308 The Journal of Educational Psychology 


the possible contention that equalization of experience and opportuni- 
ties for practice reduces sex differences, for we consistently find these 
differences from the fifth through the twelfth grade where opportunities 
for clerical experiences are about equal for the sexes. We can conclude, 
then, on the basis of a logical and somewhat limited investigation of this 
question, that these sex differences are not due to differences in clerical] 
training and experience. 

Another possible hypothesis is that the lack of significant difference 
in clerical aptitude which we find on the average between men and 
women employed in similar clerical situations is due to an interest factor 
which draws into and maintains in clerical jobs persons who have abili- 
ties and interests in detail of the sort needed on clerical jobs. It is 
suggested that this interest factor is related to proficiency both on the 
clerical aptitude test and on the clerical job. For this reason we fail to 
find on the clerical job the sex differences in clerical aptitude which are 
found in every other situation we have investigated. Although strik- 
ing sex differences appear in the general adult population, men and 
women who have this interest and aptitude in clerical detail are the 
ones who are drawn into clerical work and who remain and become 
proficient in it. 

There may be a sex difference in interest in clerical detail rather than 
a sex difference in native aptitude. It is quite possible that this sex 
difference in interest is not a native one, but is developed by differential 
environmental and social pressures. Girls, for example, may somehow 
discover that it is to their advantage in the school situation to be more 
interested in detail while most boys may develop an attitude that they 
are somehow destined to deal with larger and more important issues in 
life. If these differential attitudes exist, it is clear that they are 
developed at a very early age, 7.e., prior to entrance to the fifth 
grade. 

We may state, parenthetically, an additional hypothesis that some 
of the differences on the average between boys and girls in academic 
achievement and scholarship may possibly be due in part to this dif- 
ference in “‘clerical aptitude” which in turn might be attributed to 4 
difference -between the sexes in “clerical interest.’’ Paterson and 
Langlie® attribute sex differences in scholarship to superiority of girls in 
penmanship which is reflected in grades earned. We have found that 
girls on the average have more “clerical aptitude” than boys. This 
additional sex difference may further account for their ability to 
accumulate more of the grade rewards for academic achievement. 








ce 
id 
or 
li- 


he 
to 


k- 
nd 
he 
ne 


an 
eX 
ial 
OW 
yre 
ey 

in 


th 


me 
nic 
lif- 
0 a 
nd 
3 1D 
hat 

to 





Sex Difference in Clerical Aptitude 309 


The above speculations are presented only as logical interpretations 
of our findings and should, of course, be studied by further systematic 
investigation. 


REFERENCES 


(1) Andrew, D. M.: “An Analysis of the Minnesota Vocational Test for Clerical 
Workers, I.” J. Appl. Psychol., Vol. xx1, 1937, 18-47. 

: An Analysis of the Minnesota Vocational Test for Clerical Workers, 
II.” J. Appl. Psychol., Vol. xx1, 1937, pp. 139-172. 

(3) Andrew, D. M., and Paterson, D. G.: ‘‘ Measured Characteristics of Clerical 
Workers.” Bulletin of the Employment Stabilization Research Institute. 
Minneapolis: University of Minnesota Press, Vol. 111, No. 1, 1934. 

(4) , : Minnesota Vocational Test for Clerical Workers, Manual of 
Directions. New York: The Psychological Corp., 522 Fifth Ave., 1939. 

(5) Dvorak, B. J.: ‘Differential Occupational Ability Patterns.’’ Bulletin 
of the Employment Stabilization Research Institute. Minneapolis: Uni- 
versity of Minnesota Press, Vol. 111, No. 3, 1935, pp. 1-40. 

(6) Green, H. J., Berman, I. R., Paterson, D. G., and Trabue, M. R.: ‘‘A Manual 
of Selected Occupational Tests for Use in Public Employment Offices.” 
Bulletin of the Employment Stabilization Research Institute. Minneapolis: 
University of Minnesota Press, Vol. u, No. 3, 1933, p. 1-33. 

(7) Loevinger, J.: An Analysis of Verbal and Numerical Abilities at the Junior 
High School Level. M.S. Thesis, University of Minnesota Library, 1938. 

(8) Paterson, D. G., and Langlie, T. A.: ‘‘The Influence of Sex on Scholarship 
Ratings.”” Educ. Admin. and Supervis., Vol. x11, 1926, pp. 458-469. 

(9) Schneidler, G. G.: ‘‘Grade and Age Norms for the Minnesota Vocational Test 
for Clerical Workers.”’ Educ. and Psychol. Meas., Vol. 1, No. 2, April, 
1941. 

: Further Studies in Clerical Aptitude. Ph.D. Thesis, University of 
Minnesota Library, 1940. 

(11) Thatcher, M.: Sez Differences in Clerical Ability at the Fifth and Sizth-Grade 
Levels. Unpublished, Dept. of Psychol., University of Minnesota, 1940. 





(2) 








(10) 


Wr végns ied 
. a 
es" 





CONTRASTING APPROACHES IN PROBLEM-SOLVING 


Ss. S. SARGENT 
Barnard College 


Some time ago, while working on a study of the disarranged word or 
anagram problem, the writer discovered that subjects use two rather 
well differentiated approaches or work methods. These were called 
the ‘‘ whole approach” and the “part approach.’’* The former refers 
to a type of behavior in which the subject considers the pattern of dis- 
arranged letters as a whole, without analyzing or recombining them, to 
see if the solution word will emerge. The part approach designates a 
more or less analytical procedure, or at least a trial-and-error one, in 
which letters are combined to form syllables which might serve as 
prefixes, suffixes, and the like in order to redintegrate the solution 
word. Ascan be seen, there is some analogy between these approaches 
and Heidbreder’s ‘“‘spectator and participant behavior.”"' Most sub- 
jects, though not aware of it, utilize both procedures when they work at 
these problems; they look first at the disarranged word for about five to 
fifteen seconds, taking in the letters, possibly pronouncing them, and 
wondering what the solution can be. Then, if no solution comes to 
them, they start making combinations and continue doing so until they 
either get the solution, or, if they work long enough without success, 
some mild form of experimental neurosis! 

Of several dozen subjects used in the above experiment, only one 
showed originality in varying her work methods. This subject seemed 
to be more successful when she changed to the whole approach after 
failing with the part procedure. 

Being interested in work methods and individual differences in work 
methods, the writer decided to try to control the subject’s approach in 
order to see what effect it would have on performance in this problem. 
He was especially interested to discover whether the whole approach 
might be more efficient in solving rather difficult word problems than is 
the usual trial-and-error procedure, and also whether an alternation 
of the two methods might be more effective than either one singly. 

The subjects were twenty Barnard College students, studied indi- 
vidually for a total of about two hours of working time apiece, and 
Barnard, and Columbia extension students, divided into four groups, 
who solved two lists of the word problems in a group experiment. 

In the individual experiment the material consisted of three lists, 
each containing twelve words, equated for difficulty word for word in & 

310 








ms 


oO = S| 


xs OO ee Oo re t oer eae POUL 


we 
_— 


fas] 


“a be 


ne — ee eee — ee a 





Contrasting Approaches in Problem-solving 311 


previous study. (See Table I.) The letters composing each word 
were printed in India ink on a small white card. A pack of twelve 
such cards was placed face down in front of a subject, who turned them 
over one at a time to work upon a problem. In order to have fairly 
complete records of problem-solving behavior, each subject was trained 
to introspect out loud; 7.e., to “talk out” as she worked, by giving her 
a practice series of ten words. By the end of this time the subjects 
were able to verbalize their reactions quite naturally, though, of course, 
such reports did not constitute a complete record of mental activity. 
In the experimental series, time for the solution of each word was taken, 
and all the talked-out reports noted in protocols. 


TABLE I.—DISARRANGED Worps Usep 1n Test* 


List 1 List 2 Lier 3 
THWGIE NPEHPA EEPYLS 
ESONRA SEAXLU CLTEMAI 
PEORDI CUMTOS AGAGRE 
CEPART GINREFO EIVARR 
GAESAV BAIHT CMYOPAN 
ETROS ETLHHA EONSPR 
GUORP NAETRU UDLQII 
NMGOINR CNEGAH RECMI 
YEVER SDURG EDDILM 
SUTCBII SPRUUE NSRWAE 
TMPYE LYREWEJ LOPPEE 
AIMANL UCTRK IPUCLB 

* Solutions are: 
weight happen sleepy 
reason sexual climate 
period custom garage 
carpet foreign arrive 
savage habit company 
store health person 
group nature liquid 
morning change crime 
every drugs middle 
biscuit pursue answer 
empty jewelry people 
animal truck public 


The individual subjects were divided into experimentals and con- 
trols. The ten experimental subjects solved the initial series of twelve 


words in their own way, without any suggestions from the experi- 
menter. Then these subjects were instructed in the nature of the whole 
approach, as distinguished from the part approach, and were given two 
sample words to solve in this way. They were told to work on the 





312 The Journal of Educational Psychology 


word problems in the second series by looking at each one as a whole, 
with no analysis or manipulation of letters permitted. For the third 
series the subjects were directed to alternate their methods; 7.e., to 
shift from one to another approach whenever the one they were using 
proved fruitless. 

A control group of ten subjects was given the same three series 
of words in the same order, but these subjects were offered no sugges- 
tions at any time on procedure; they worked at all the words entirely 
in their own way. The chief purpose of the control group, of course, 
was to see if improvement occurred from practice alone, uninfluenced 
by work method. 


RESULTS 


The measure of each subject’s efficiency in solving these problems 
was the median time taken for each series of twelve words. Table II 
shows the averages of these median times for the experimental and the 
control groups in the three series: 


TaBLE I].—AVERAGES OF MEDIAN Times REQUIRED IN SOLVING PROBLEMS 














Experimental Control 
as al i ke 40.4 seconds | 32.7 seconds 
ee ess enh ee skeen ee eee 29.9 - 44.2 o 
cn owed 646 edue 20.4 s 30.6 - 





This table shows considerable decrease in time taken to solve the 
problems by the experimental group, where methods were changed; and 
no such tendency in the control group, where methods remained the 
same. 

However, since the number of subjects in the individual] experi- 
ments was small, an attempt was made to get results of a more statisti- 
cal nature. Four groups of students, comprising ninety-three subjects, 
were used. Here two lists of twenty words each, equated for difficulty, 
were the experimental materials. The subjects were given ten minutes 
to solve as many words as possible in the first list, with no instruction 
as to method. (In two of the groups list A was used first; in the other 
two, list B.) Immediately afterwards the nature of the whole and part 
approaches was explained, and examples given on the board. The 
subjects were instructed, when taking the second list, to shift from the 
whole to the part approach, or vice versa, when they failed to make 











Contrasting Approaches in Problem-solving 313 


progress with the method being used at the time. Then they were 
given the second list of twenty words to work on for ten minutes. 

The group results are inconclusive. For all ninety-three subjects 
the average number of words solved in the first list was 9.98, and in 
the second (after changes in method had been suggested) 10.13. The 
difference is insignificant. ‘Two of the groups lost by a fraction of a 
point, one gained slightly, and one gained by 1.40 words. The gain in 
this last group, which consisted of twenty-one subjects, yielded D/od 
of 2.41, which approaches statistical significance. 

Probably the chief reason for disagreement between individual 
and group results in this experiment is the inability to control work 
methods by the group technique; it is impossible to know how much, if 
at alJ, the subjects really do vary their procedures in the second test. 
Working with the individual subject, however, a fairly accurate picture 
of work methods can be obtained from the talked-out data. 

In the experiments on thinking and problem-solving too little 
attention, in the writer’s opinion, has been given to individual differ- 
ences in the behavior of subjects. Furthermore, these individual 
differences should be studied both quantitatively and qualitatively, to 
give an accurate picture.2 By means of the measures of time and 
accuracy, and the data in the protocols, it was possible to make such an 
attempt in the present experiment. 

Of the experimental group of ten individual subjects, six improved 
considerably when they turned to the whole approach, two showed no 
appreciable change, and two became substantially worse. Is there any 
indication as to why this was the case? 

We find, first, that the four subjects who improved most following a 
change in approach or procedure were slow in their performance on the 
initial series; the two subjects who became worse when they changed 
methods were faster than the average. But this does not indicate 
much, as there were two fast subjects who also improved, and one slow 
and one fast one who neither gained nor lost. 

It seemed worth while to analyze the problem-solving behavior of 
these subjects a little more closely. The writer found in his previous 
investigation of the disarranged word problem? that there are several 
important aspects of thinking or problem-solving behavior which 
make for success with anagram problems solved by the usual trial-and- 
error procedure. These are responsiveness to cues hit upon during 
manipulation of letters (e.g. ing in nmgoinr, per in peordi); limitation 
of hypotheses to letters given (e.g. diaper or peoria are not far off as 





314 The Journal of Educational Psychology 


guessed solutions for peordi, but poison, report, priory, and people are 
far-fetched); use of analysis (e.g. choosing ing as a common suffix, or 
reasoning that the g might be silent in the solution of thwgie) ; versatil- 
ity (speed and fruitfulness in setting up hypotheses) ; lack of susceptibil- 
ity to interference and perseveration (not being bothered by the 
persistence or recurrence of unwanted combinations). 

A comparison was made between the averages of the four subjects 
who improved most with the whole approach, and the two who became 
worse when applying it, which is showed in Table III. 


TaBLE III.—Comparison or SuBJEcTS WHo IMPROVED AND THOSE Wuo Gor 
WoRSE WITH THE WHOLE APPROACH 














Four who Two who got 

improved worse 
no. ican ide wah e Cokwidkes 64 xous 5.25 words 3 words 
ES: ani os a ee en ar TA eeDhss weeees 3.5 0 " 
I iil aa. ia eee cubis seg kbs.0.0 44 2.5 zs 3 we 
Versatility (hypotheses per minute)........... 4.6 4.8 “ 
IL, . woo nacctsieeeeaces coeds 3.25 “ —_ 
RE Pe A 4.00 “ 1.5 ” 





The less efficient problem-solving behavior of the first four subjects 
seems to be at least in large part a function of poor responsiveness to 
cues, inability to limit hypotheses to the letters presented, and sus- 
ceptibility to interference and perseveration. When these subjects 
turned to the whole approach, these aspects of behavior were not 
involved. Instead, the essential seemed to be the maintenance of an 
attitude of receptiveness or readiness for sudden perceptual reorganiza- 
tion—the exact picture is hard to see because the introspections of the 
subjects throw little light on it. In any event, these subjects were not 
at such a disadvantage here. All of them found they were able to 
maintain the whole approach without reverting to combinations, and 
two of them were aware of almost immediate improvement in their 
performance and commented upon it. 

On the contrary, the. two subjects who were superior in the respects 
mentioned above found greater difficulty in assuming and maintaining 
the whole approach. It was very difficult for them to avoid manipulat- 
ing and combining the letters into syllables as they tried to gaze at the 
whole letter pattern without analysing. One reported, ‘‘The more I 








J —_ Se . Iw al 





Contrasting Approaches in Problem-solving 315 


focus on the word as a whole, the more inadequate I feel.”” They 
seemed more used to analytical methods and were greatly disturbed 
by the change. One of these subjects (who had played anagrams a 
great deal before) was so upset that her median time was raised from 
eight seconds in the initial series to forty-eight seconds in the whole 
series. Both of them disliked the whole approach and reported they 
felt more efficient when using analytical and trial-and-error procedure. 

Again, however, it should be pointed out that some of the subjects 
who were effective or ineffective at part procedure showed no change in 
efficiency when they changed methods. Apparently a subject need 
not improve or become worse after changing procedures; but there 
does seem to be some tendency for those who are not adept at a given 
method to benefit through a change to another approach; and, con- 
trariwise, for those who are effective to be disturbed. This question 
of interrelationship among work methods is an important one which 
deserves to be investigated further. 

In the third series of words the subjects were told to alternate 
between the whole and part approaches whenever they chose. The 
protocols show that they alternated at intervals of about a minute. 
Most of them spent more time with the part approach, whether or not 
it had produced superior results in the first two series. In all cases the 
subjects did as well or better in the third series than they had done in 
the first and second. Their subjective preferences agreed with these 
results, for all of them favored an alternation of methods as the best 
procedure for them to follow, rather than adhering exclusively to either 
the whole or the part procedure. 

Are there only two work methods used in this problem? As far 
as the experimenter has been able to observe, any approaches or pro- 
cedures manifested did fit into one or the other of these categories. 
There were, of course, individual variations. Some subjects verbalized 
more than others. With the part approach, as already suggested, 
some subjects sought to form usual syllables—or small meaningful 
words—while others made combinations of letters more or less at 
random. While utilizing the whole approach most subjects found a 
fixation point, usually at the middle of the word, and often held out the 
card at arm’s length, as if to prevent seeing single letters too clearly. 
One or two individuals found it possible to change their point of fixa- 
tion several times without slipping over into trial-and-error combining 
of letters. In all cases, however, the part approach involved manipu- 
lating and reconstructing the letters presented, while the whole 


"eo ae ee oe 
Ss one 





316 The Journal of Educational Psychology 


approach avoided this by focussing attention upon the pattern of let- 
ters, with a state of readiness for sudden emergence of the solution. 

It is perhaps of interest to note that a change from the part to the 
whole approach helped or hindered subjects in solving, not the easiest 
or the hardest of the words, but those of medium difficulty. The 
easiest words are solved by almost everyone very rapidly, by a sort of 
immediate reorganization of the letters; and the hardest ones (requir- 
ing in the experiment over five minutes to solve) practically always 
involved combining or manipulating of letters if they were to be solved 
at all. But the words taking between ten seconds and five minutes for 
solution showed the following results when methods were changed: 
The four subjects who improved most took seventy-three seconds 
on the average in the initial series; they required only twenty-eight 
seconds when the whole was applied. The subjects who lost in effi- 
ciency took thirty seconds in the first series but needed forty-three 
seconds when following the whole procedure. The disarranged words 
of medium difficulty, avoiding the two extremes, provide the clearest 
illustrations of the effects of changed procedures upon problem-solving 
behavior. 

It would be interesting to know whether, and to what extent, whole 
and part approaches are found in other types of problems, such as 
analogies, syllogisms, picture puzzles, geometric originals, and the like. 
If they are, what happens when work methods are changed or alter- 
nated? Most important of all, which method or methods are most 
effective for most subjects in a given kind of problem? 

Neither statistical nor objective laboratory methods will suffice 
for this investigation. Instead, there must be many extended individ- 
ual studies, involving some form of that still suspect technique— 
introspection. Out of such research may come much of theoretical and 
practical value. 


BIBLIOGRAPHY 


1. Heidbreder, E. ‘‘An experimental study of thinking.”” Arch. Psychol., 1924, 
No. 73. 

2. Sargent, S. S. ‘How shall we study individual differences?’ Psychol. 
Review, Vol. xirx, 1942, pp. 170-181. 

3. Sargent, S. 8. ‘Thinking processes at various levels of difficulty.’’ Arch. 
Psychol., 1940, No. 249. 








VALIDATION OF THE SIMPLIFIED METHOD 
FOR SCORING THE STRONG VOCATIONAL 
INTEREST BLANK FOR MEN 
LEONARD KOGAN AND FREDERICK GEHLMANN 
University of Rochester 


In September, 1940, Peterson and Dunlap'? reported on a simplified 
method for machine-scoring the Strong Vocational Interest Blank for 
Men (Revised-Form M) before the American Psychological Associa- 
tion. Their method consisted essentially of scoring the four hundred 
items of the blank by using unit weights of +1, 0, and —1 rather than 
the weights of Strong which range from +4 to —4. This new tech- 
nique appeared to be so economical in terms of time and effort and yet 
to yield scores and letter-grades so faithful to the original, that it was 
decided to reverify the method by applying both the simplified and the 
original systems of scoring to a new group of individuals. Thus, it was 
possible to compare the scores and letter-grades obtained by using the 
simplified keys and translating tables with the scores and letter-grades 
obtained by the original keys. 

The group employed in this study consisted of two hundred eight 
first-year men students at the College of Arts and Science of the Univer- 
sity of Rochester. The Interest Blanks were administered to each of 
these individuals during the Fall of 1940 and then scored in the Inter- 
national Business Machines Test Scoring Machine both by means of 
Strong’s original keys and by the simplified keys. Since fourteen 
occupations were thus analyzed, a total of two hundred eight times 
fourteen or 2912 scores resulted for each of the methods. 

The correlations between the two sets of scores were then deter- 
mined for the fourteen occupations. These correlations are listed in 
Table I. They were found to range from .957 to .989 with a median r 
of .976. The scores were also translated to letter-grades by both the 
original and simplified tables. The comparisons of the original and 
simplified letter-grades are givenin Table II. Of the six possible letter 
grades (A, B+, B, B—, C+, C) it is seen that 2161 of the 2912, or 
74.2 per cent yielded the same letter grade by the simplified as by the 
original method; seven hundred twenty-one or 24.76 per cent shifted a 
half step; twenty-eight or 0.97 per cent shifted a whole step. Only two 
scores of the 2912 scores shifted more than a full letter-grade. 

As pointed out in the report of Peterson and Dunlap these shifts are 
negligible for the most part. The only possibly objectionable shift is 

317 





318 


The Journal of Educational Psychology 





TaBLeE I.—THeE CoRRELATIONS BETWEEN THE SIMPLIFIED AND ORIGINAL Scores 


N = 208 
OccuPATION 
ER ad oe gad ek bbe bee'so * .957 
ER St Se . 982 
Life insurance salesman.............. . 986 
Math-science teacher................ . 980 
Personnel manager.................. . 966 
EE EE eee . 963 
a ea a as a . 967 
Purchasing agent................... .976 
Social science teacher................ .971 
PM DCCeae cpt e cub akdaawce ens . 989 
Aen oS eke anes end abs b .981 
ES ti bas Ane bas kondee nese .973 
ES a. Loads ou es eewee ese a .977 
I YA a I gr ne . 985 


CORRELATION, 


Range 
Median .976 


ORIGINAL vs. SIMPLIFIED 


.957 to .989 


TasBLeE II.—ANALYsIS OF THE DIFFERENCES IN LETTER-RATINGS OF OCCUPATIONAL 


INTEREST WHEN SCORED BY THE SIMPLIFIED AND ORIGINAL METHODS 
Total Number of Scores = 2912 





Number of changes of 

















No. scores 
— ae 14 letter- | 1 letter- | 1% letter- 

— grade grade grade 
NG on 6 ane ginteewacgaeh ecu i 120 69 17 2 
I Si tw as an alles sah esr 135 71 2 0 
Life insurance salesman.......... 170 38 0 0 
Math-science teacher............ 155 53 0 0 
Personnel manager.............. 141 63 4 0 
EE fle ic gheetiwwsen &bcws 148 60 0 0 
Psychologist................-+- 155 52 1 0 
Purchasing agent............... 156 52 0 0 
Social-science teacher........... 144 62 2 0 
ie a wane Wo 6 6:8 Mae Rae 167 41 0 0 
Ds cetintecs babe wha 176 30 2 0 
EE cies sstedencnovwad 140 68 0 0 
IL. 4 i wcewresioneecesed 172 36 0 0 
os sg dean o-éaede t6 eee 182 26 0 0 

0 Pe rer Tre 2161 721 28 

Per cent of total scores.......... 74.2 24.76 0.96 0.07 

















Simplified Method for Scoring the Strong Blank 319 


that from B+ to a lower category, since most counsellors regard only A 
and B+ as being worthy of positive advisory consideration. Shifts 
upward from B to B+ merely mean that more emphasis is given to the 
vocations involved and probably are not seriously detrimental. The 
so-called critical shifts from B+ to B are presented in Table III. It 
was found that one hundred one cases of the 2912, or 3.4 per cent, 
shifted in the critical direction. The original investigation likewise 
reported about 3 per cent critical shifts. It was again found that 
15.6 per cent of the cases yielded grades in the B level and hence, if 
extreme conformity to the original scores is desired for these occupa- 
tions, these papers might be rescored by the original keys. 


TaBLeE II].—Tue NumsBer or INpIvipvaALs RaTep B sy THE SIMPLIFIED METHOD 
put B+ BY THE ORIGINAL METHOD, TOGETHER WITH THE NUMBER OF INDI- 
VIDUALS WuHo ReEcervep a B RaTING ON THE SIMPLIFIED METHOD 











Number of changes, 

Occupation original B+ to penne ad one 

simplified B (simpli method) 
ee | es eee 17 26 
Ks. +4 ocakes ewe ee ses 10 25 
Life insurance salesman........ 3 31 
Math-science teacher........... 5 32 
Personnel manager............. 5 34 
Ne os eG es ake ead 2 45 
ER a a ee 11 39 
Purchasing agent.............. 10 34 
Social-science teacher....... vr 11 40 
6 diez dendddeneden 7 14 
Ds 60 d'0 a 4:4 bared Maio 8 31 
ee 5 30 
cscs twavedcaanaee 4 45 
60 0 wd bn nchwa eed ae 3 29 
a Pe Se ey 101 455 
ER. oon sckcwnben 2912 2912 

Per cent changes.............. | 3.47 Per cent scores 15.6 











The simplified method of scoring seems to be entirely corroborated 
by the present independent study. High correlations were found 
between the original and simplified scores, and the letter-grade ratings 
were likewise found to correspond quite closely. Since the simplified 
method demands less than half the time and labor—both theoretically 








320 The Journal of Educational Psychology 


and empirically—that is necessitated by the original method of 
machine-scoring, its advantages are obvious. 


REFERENCES 


(1) Dunlap, Jack W.: “Simplification of the Scoring of the Strong Vocational 
Interest Blank.” Psych. Bull., Vol. xxxvu, No. 7, 1940, p. 450. 

(2) Peterson, Bertha M. and Dunlap, Jack W.: “A simplified method for scoring 
the Strong Vocational Interest Blank.” Journal of Consulting Psychology, 
Vol. 5, No. 6, 1941, 269-274. 








of 


al 


y; 





